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POLYKETIDES AND THEIR SYNTHESIS 



The present invention relates to processes and 
materials (including enzyme systems, nucleic acids, 
vectors and cultures) for preparing polyketides, 
particularly polyethers but including polyenes, 
macrolides and other polyketides by recombinant 
synthesis, and to the polyketides so produced, 
particularly novel polyketides. (N.B the term 
"polyketide" is being used in its conventional sense to 
include structures notionally derived by the reduction 
and/or other processing or modification of one or more 
Ketide units) . Furthermore the invention provides the 
entire nucleic acid sequence of the biosynthetic gene 
cluster that governs the production of the ionophoric 
antibiotic polyether polyketide monensin in Streptomyces 
cinnamonensis, and the use of all or part of the cloned 
DNA first, in the specific detection of other polyether 
biosynthetic gene clusters; secondly in the engineering 
of mutant strains of S. cinnamonensis and of other 
actinomycetes which are suitable host strains for the 
high level production of novel recombinant polyketides; 
and thirdly in the provision of recombinant biosynthetic 
genes which lead to such novel polyketide products. 

Polyketides are a large and structurally diverse 
class of natural products that includes many compounds 
possessing antibiotic or other pharmacological 
properties, such as erythromycin, tetracyclines, 
rapamycin, avermectin, monensin, epothilones and FK50 6. 
In particular, polyketides are abundantly produced by 



- 2 - 

Streptomyces and related actinomycete bacteria. They are 
synthesised by the repeated stepwise condensation of 
acylthioesters in a manner analogous to that of fatty 
acid biosynthesis. The greater structural diversity found 
5 among natural polyketides arises from the selection of 

(usually) acetate or propionate as "starter" or 
"extender" units; and from the differing degree of 
processing of the ft-keto group observed after each 
condensation. Examples of processing steps include 

10 reduction to p-hydroxyacyl-, reduction followed by 

dehydration to 2-enoyl-, and complete reduction to the 
saturated acylthioester . The stereochemical outcome of 
these processing steps is also specified for each cycle 
of chain extension. In addition, the biosynthetic 

15 pathways to many polyketides involve additional enzyme- 

catalysed modifications which may include: methylation by 
O- and C-methyltransf erases , hydroxylation by cytochrome 
P450 enzymes, other oxidation or reduction processes, and 
the biosynthesis and attachment of novel sugars and/or 

20 deoxy sugars. 

The biosynthesis of polyketides is initiated by a 
group of chain-forming enzymes known as polyketide 
synthases. Two classes of polyketide synthase (PKS) have 
been described in actinomycetes . One class, named Type I 

25 PKSs, represented by the PKSs for the macrolides 

erythromycin, oleandomycin, avermectin and rapamycin, 
consists of a different set or "module" of enzymes for 
each cycle of polyketide chain extension. (For examples 
see Cortes, J. et al . Nature (1990) 348:176-178; Donadio, 

30 S. et al. Science (1991) 252:675-679; Swan, D.G. et al . 
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Mol. Gen, Genet. (1994) 242:358-362; MacNeil, D.J. et al . 
Gene (1992) 115:119-125; Schwecke, T. et al . Proc . Natl. 
Acad. Sci. USA (1995) 92:7839-7843.) 

The term "extension module" as used herein refers to 
5 the set of contiguous domains, from a p-ketoacyl-ACP 

synthase PKS") domain to the next acyl carrier protein 
("ACP") domain, which accomplishes one cycle of 
polyketide chain extension. The term "loading module" is 
used to refer to any group of contiguous domains which 

10 accomplishes the loading of the starter unit onto the PKS 

and thus renders it available to the KS domain of the 
first extension module. The length of polyketide formed 
has been altered, in the case of erythromycin 
biosynthesis, by specific relocation using genetic 

15 engineering of the enzymatic domain of the erythromycin- 

producing PKS that contains the chain releasing 
thioesterase/cyclase activity (Cortes J. et al . Science 
(1995) 268:1487-1489; Kao, CM. et al . J. Am. Chem. Soc. 
(1995) 117:9105-9106) . 

20 In-frame deletion of the DNA encoding part of the 

ketoreductase domain in module 5 of the erythromycin- 
producing PKS (also known as 6-deoxyerythronolide B 
synthase, DEBS) has been shown to lead to the formation 
of erythromycin analogues 5, 6-dideoxy-3-ot-mycarosyl-5- 

25 oxoerythronolide B, 5, 6-dideoxy-5-oxoerythronolide B and 

5, 6-dideoxy, 6-p-epoxy-5-oxoerythronolide B (Donadio, S. 
et al. Science (1991) 252:675-679). Likewise, alteration 
of active site residues in the enoylreductase domain of 
module 4 in DEBS, by genetic engineering of the 

30 corresponding PKS-encoding DNA and its introduction into 



Saccharopolyspora erythraea, led to the production of 

6, 7-anhydroerythromycin C (Donadio, S. et al . Proc. Natl. 

Acad. Sci. USA (1993) 90:7119-7123). 

International Patent Application number WO 93/13663 
describes additional types of genetic manipulation of the 
DEBS genes that are capable of producing altered 
polyketides. However many such attempts are reported to 
have been unproductive (Hutchinson, C.R. and Fujii, I. 
Annu. Rev. Microbiol. (1995) 49:201-238, at p. 231) . The 
complete DNA sequence of the genes from Streptomyces 
hygroscopicus that encode the modular Type I PKS 
governing the biosynthesis of the macrocyclic 
immunosuppressant polyketide rapamycin has been disclosed 
(Schwecke, T. et a 1 . (1995) Proc. Natl. Acad. Sci. USA 
92:7839-7843). The DNA sequence is deposited in the 
EMBL/Genbank Database under the accession number X86780. 

WO 98/01546 discloses that a PKS gene assembly 
(particularly of Type I) encodes a loading module which 
is followed by at least one extension module. The first 
open reading frame encodes the first multi-enzyme or 
cassette (DEBS1) which consists of three modules: the 
loading module (ery-load) and two extension modules 
(modules 1 and 2) . The loading module comprises an 
acyltransf erase and an acyl carrier protein. This may be 
contrasted with Figure 1 of WO 93/13663 (referred to 
above) . This shows 0RF1 as only two modules, the first of 
which is in fact both the loading module and the first 
extension module* 

WO 98/01546 describes in general terms the 
production of a hybrid PKS gene assembly comprising a 



loading module and at least one extension module. It also 
describes (see also Marsden, A.F.A. et a 1 . Science (1993) 
279:199-202) construction of a hybrid PKS gene assembly 
by grafting the wide-specificity loading module for the 
avermectin-producing polyketide synthase onto the first 
multi-enzyme component (DEBS1) for the erythromycin PKS 
in place of the normal loading module. Certain novel 
polyketides can be prepared using the hybrid PKS gene 
assembly, as described for example in WO 98/01571. 

WO 98/01546 further describes the construction of a 
hybrid PKS gene assembly by grafting the loading module 
for the rapamycin-producing polyketide synthase onto the 
first multi-enzyme component (DEBS1) for the erythromycin 
PKS in place of the normal loading module. The loading 
module of the rapamycin PKS differs from the loading 
modules of DEBS and the avermectin PKS in that it 
comprises a CoA ligase domain, an enoylreductase ( "ER" ) 
domain and an ACP, so that suitable organic acids 
including the natural starter unit 3,4- 

dihydroxycyclohexane carboxylic acid may be activated in 
situ on the PKS loading domain and, with or without 
reduction by the ER domain, transferred to the ACP for 
intramolecular loading of the KS of extension module 1 
(Schwecke, T. et al. Proc. Natl. Acad. Sci . USA (1995) 
92:7839-7843). WO 98/51695 and WO 98/49315 describe 
additional types of genetic manipulation of the DEBS 
genes that are capable of producing altered polyketides. 

The second class of PKS, named Type II PKSs, is 
represented by the synthases for aromatic compounds. Type 
II PKSs contain only a single set of enzymatic activities 



for chain extension and these are re-used as appropriate 
in successive cycles (Bibb, M.J. et al . EMBO J. (1989) 
8:2727-2736; Sherman, D.H. et al. EMBO J. (1989) 8:2717- 
2725; Fernandez-Moreno, M.A. et al . J. Biol. Chem. (1992) 
267:19278-19290). The "extender" units for the Type II 
PKSs are usually acetate units, and the presence of 
specific cyclases dictates the preferred pathway for 
cyclisation of the completed chain into an aromatic 
product (Hutchinson, C.R. and Fujii, I. Ann. Rev. 
Microbiol. (1995) 49:201-238). Hybrid polyketides have 
been obtained by the introduction of cloned Type II PKS 
gene-containing DNA into another strain containing a 
different Type II PKS gene cluster, for example by 
introduction of DNA derived from the gene cluster for 
actinorhodin, a blue-pigmented polyketide from 
Streptomyces coelicolor, into an anthraquinone 
polyketide-producing strain of Streptomyces gallleus 
(Bartel, P.L. et al . J. Bacterid. (1990) 172:4816-4826). 

The minimal number of domains required for 
polyketide chain extension on a Type II PKS when 
expressed in a Streptomyces coelicolor host cell (the 
"minimal PKS") has been defined for example in WO 
95/08548 as containing the following three polypeptides 
which are products of the actX genes: firstly KS; 
secondly a polypeptide termed the CLF with end-to-end 
amino acid sequence similarity to the KS but in which the 
essential active site residue of the KS, namely a 
cysteine residue, is substituted either by a glutamine 
residue or, in the case of the PKS for a spore pigment 
such as the whiE gene product (Davis, N.K. and Chater, 



K.F. Mol. Microbiol. (1990) 4:1679-1691) by a glutamic 
acid residue; and finally an ACP. The CLF has been stated 
(for example in WO 95/08548) to be a factor that 
determines the chain length of the polyketide chain that 
is produced by the minimal PKS. However it has been found 
(Shen, B. et al . J. Am. Chem. Soc. (1995) 117:6811-6821) 
that when the CLF for the octaketide actinorhodin is used 
to replace the CLF for the decaketide tetracenomycin in 
host cells of Streptomyces glaucescens, the polyketide 
product is not found to be altered from a decaketide to 
an octaketide, so the exact role of the CLF remains 
unclear. An alternative nomenclature has been proposed in 
which KS is designated KSa and CLF is designated KSP, to 
reflect this lack of knowledge (Meurer, G. et al . 
Chemistry & Biology (1997) 4:433-443). The mechanism by 
which acetate starter units and acetate extender units 
are loaded onto the Type II PKS is not known, but it is 
speculated that the malonyl-CoA: ACP acyltransf erase of 
the fatty acid synthase of the host cell can fulfil the 
same function for the Type II PKS (Revill, W.P. et al. J. 
Bacteriol. (1995) 177:3946-3952). 

WO 95/08548 describes the replacement of 
actinorhodin PKS genes by heterologous DNA from other 
Type II PKS gene clusters, to obtain hybrid polyketides . 
It also describes the construction of a strain of 
Streptomyces coelicolor which substantially lacks the 
native gene cluster for actinorhodin, and the use in that 
strain of a plasmid vector pRM5 derived from the low-copy 
number vector SCP2* isolated from Streptomyces coelicolor 
(Bibb, M.J. and Hopwood, D.A. J. Gen. Microbiol. (1981) 
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126:427-442) and in which heterologous PKS-encoding DNA 
may be expressed under the control of the divergent actl/ 
actlll promoter region of the actinorhodin gene cluster 
(Fernandez-Moreno, M.A. et al . J. Biol. Chem. (1992) 
5 267:19278-19290). The plasmid pRM5 also contains DNA from 

the actinorhodin biosynthetic gene cluster encoding the 
gene for a specific activator protein, ActII-orf4. The 
ActII-orf4 protein is required for transcription of the 
genes placed under the control of the actJ/actJJJ 
10 bidirectional promoter and activates gene expression 

during the transition from growth to stationary phase in 
the vegetative mycelium (Hal lam, S.E. et al . Gene (1988) 
74 : 305-320) . 

Type II clusters in Streptomyces are known to be 
15 activated by pathway-specific activator genes (Narva, 

K.E. and Feitelson, J.S. J. Bacteriol . (1990) 172:326- 
333; Stutzman-Engwall, K.J. et al . J. Bacteriol. (1992) 
174:144-154; Fernandez-Moreno, M.A. et al . Cell (1991) 
66:769-780; Takano, E. et al . Mol . Microbiol. (1992) 
20 6:2797-2804; Gramajo, H.C. et al . Mol. Microbiol. (1993) 

7:837-845). The DnrI gene product complements a mutation 
in the actJJ-orf4 gene of S. coelicolor, implying that 
DnrI and ActII-orf4 proteins act on similar targets. A 
gene (srmR) has been described (EP 0 524 832 A2) that is 
25 located near the Type I PKS gene cluster for the 

macrolide polyketide spiramycin. This gene specifically 
activates the production of the macrolide antibiotic 
spiramycin, but no other examples have been found of such 
a gene. Also, no homologues of the Actll-orf 4/DnrI/RedD 



family of activators have been described that act on Type 
I PKS genes. WO 98/01546 describes the use of the Actll- 
orf4 family of activators in conjunction with their 
cognate promoters (e.g actII-orf4 with the act J promoter) 
in a heterologous actinomycete to obtain high level 
expression of recombinant Type I polyketide synthase 
genes . 

Although large numbers of therapeutically important 
polyketides have been identified, there remains a need to 
obtain novel polyketides that have enhanced properties or 
possess completely novel bioactivity. The complex 
polyketides produced by Type I PKSs are particularly 
valuable, in that they include compounds with known 
utility as antihelminthics, insecticides, 
immunosuppressants, antifungal agents or antibacterial 
agents. Because of their structural complexity, such 
novel polyketides are not readily obtainable by total 
chemical synthesis, nor by chemical modifications of 
known polyketides. 

There is also a need to develop reliable and 
specific ways of deploying individual genes and portions 
of genes in practice so that all, or a large fraction, of 
hybrid PKS genes that are constructed, are viable and 
produce the desired polyketide product. This includes the 
development of advantageous host strains for expression 
of such genes. For example many polyketides are rendered 
bioactive by the action of further enzymes other than the 
polyketide synthase, and host strains that contain and 
are able to express the genes for such enzymes are 
particularly convenient for the efficient synthesis of 



the bioactive material. In those cases where the 
construction of a known or a novel polyketide requires 
specialised precursors, host strains containing and able 
to express the genes for key enzymes that enhance the 
production of such specialised precursors are equally 
valuable and desirable. There is also a need to develop 
rational methods of increasing the expression level of 
all the genes required for production of a specific 
polyketide. Clearly also a host cell which is 
advantageous for the above reasons, and/or because of 
other favourable characteristics including but not 
limited to its speed of growth, excellent handling 
characteristics in fermentation, and ease of 
transformation with DNA by various techniques, can be 
made even more favourable by the cloning into that cell 
of such auxiliary genes for polyketide modification, or 
gene activation, or post-translational modification, or 
precursor supply . 

The DNA sequences have been disclosed for several 
Type I PKS gene clusters that govern the production of 
16-membered macrolide polyketides, including the tylosin 
PKS from Streptomyces fradiae (application EP 0 791 655 
A2), the niddamycin PKS from Streptomyces caelestis 
(Kavakas, S.J. et al . J. Bacterid. (1997) 179:7515-7522) 
and the spiramycin PKS from Streptomyces ambofaciens 
(application EP 0791 655 A2) . DNA sequences have also 
been disclosed for Type I PKS gene clusters that govern 
the production of further complex polyketides, for 
example rifamycin from Amycolatopsis mediterranei (WO 
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98/07868), and soraphen from Sorangium cellulosum (US 
5716849), but so far no DNA sequence has been disclosed 
for one of the most widespread and important classes of 
complex polyketides, the polyethers. 
5 Polyethers form an important group of complex 

polyketide antibiotics (Westley, J.W. in "Antibiotics IV. 
Biosynthesis" (Corcoran, J.W, Ed.), Springer-Ver lag, New 
York (1981) p. 41-73) . They are polyoxygenated carboxylic 
acids which act as selective ionophores transporting 

10 cations across the cell membrane of target cells and 

thereby causing depolarisation and cell death. Certain 
polyethers including monensin, lasalocid and tetronasin 
are in widespread use in animal husbandry as 
coccidiostats (principally targetted against Eimeria 

is spp. } and as growth promoters. Polyethers have also been 

reported to be active in vitro and in vivo against the 
malarial parasite Plasmodium falciparum (Gumila, C. et 
al . Antimicrobial Agents and Chemotherapy (1997) 41: 523- 
529) , 

20 Polyethers contain multiple asymmetric centres and 

are characterised by the presence of tetrahydrof uran and 
tetrahydropyran rings, producing a characteristic shape 
which is non-polar on its outer surface and therefore 
well adapted for transport of material across bacterial 

25 membranes; and provides on its inner surface polar 

coordinating ligands for a centrally-bound metal ion. In 
addition to tetrahydrof uran and tetrahydropyran rings, 
other groups which are often present include spiroketal, 
dispiroketal, and substituted benzoic acid moieties and 

30 occasionally other groups for example a tetronic acid or 
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a 6-membered carbocyclic ring 

Monensins A and B are produced by the actinomycete 
Streptomyces cinnamonensis . Their structures are shown in 
Figure 1. Monensin B differs from monensin A only in the 
5 presence of a methyl sidechain at C-16 rather than an 

ethyl sidechain. Monensin selectively binds and 
transports sodium ions. In addition to its antibacterial 
and antifungal properties monensin has some activity 
against protozoal parasites such as the malarial parasite 

10 Plasmodium falciparum. Although the structures of 

polyethers differ significantly from those of other 
complex polyketides such as the polyhydroxylated and 
polyene macrolides, their biosynthesis appears to take 
place by a metabolic pathway which has many common 

15 elements. Thus experiments using carbon 14-labelled 

precursors have shown that monensin A is synthesised from 
five acetate, one butyrate and seven propionate units 
(Day, L.E. et al . Antimicrob. Agents Chemother. (1973) 
4:410-414). Similarly experiments using precursors 

20 doubly-labelled with carbon-13 and oxygen-18 have shown 

that oxygens (0)1, (0)3, (0)4, (0)5, (0)6 and (0)10 of 
monensin arise from the carboxylate oxygens of either 
propionate or acetate, while growth in the presence of 
oxygen-18 oxygen gas demonstrated that the three 

25 remaining ether oxygens (0)7, (0)8 and (0)9 are derived 

from molecular oxygen (Cane, D.E. et al . , J. Am. Chem. 
Soc. (1981) 103:5962-5965; Cane, D.E. et a J . J. Am. Chem. 
Soc. (1982) 104:7274 - 7281; Ajaz, A. A. and Robinson, 
J. A. J. Chem. Soc. Chem. Commun. (1983) 12:679-680), 

30 These findings have been rationalised by proposing that 



the biosynthesis of monensin proceeds via an acyclic 
triene intermediate (1) in which the geometry of all 
three carbon-carbon double bonds is E (entgegen) rather 
than Z (zusammen) . The triene is then proposed to be 
subject to epoxidation to a tri-epoxide (2) and then ring 
opening is proposed to occur with concomitant sequential 
formation of the five ether rings as shown in Figure 2A. 
Such a biosynthetic pathway, first mooted by Westley in 
1974 (Westley J.W. et al . , J. Antibiot. (1974) 27:597- 
604) accounts for the observed stereochemistry at the 
multiple asymmetric centres in monensin, (Cane, D.E. et 
al. J. Am. Chem. Soc. (1982) 104:7274-7281; Sood, G.R. et 
al. J. Chem. Soc. Chem. Cominun. (1984) 21:1421-1424) and 
analogous schemes can be used to account for the 
biosynthesis of other known polyethers. such as lasalocid 
A (Hutchinson C.R. et al . , J. Am. Chem. Soc. (1981) 
103:5953-5956), tetronasin (ICI 139603) (Demetriadou, 
A.K. et al. J. Chem. Soc. Chem. Commun. (1985) 7:408-410) 
and narasin (Spavold, Z. et al . Tetrahedron Letters 
(1986) 27:3299-3302). The hydroxylation at C-26 and the 
introduction of an O-methyl group on oxygen 3 are 
proposed to occur as late steps in the biosynthesis, 
after formation of the polyether structure. 

Unfortunately key aspects of the biosynthetic scheme 
shown in Figure 2A have so far eluded experimental 
confirmation. No biosynthetic intermediates have been 
isolated from mutants of S. cinnamonensis that are 
blocked in early stages of monensin production. 2 6- 
deoxymonensin A has been isolated from a S. cinnamonensis 
mutant partially blocked in monensin production 



(Ashworth, D.M. et al . J. Antibiot. (1989) 42:1088-1099) 
and 3-0-demethylmonensins A and B have been recovered as 
minor components from the fermentation broth of a 
monensin-producing strain (Pospisil, S. et al . J. 
Antibiot. (1987) 40:555-557). When fed to cells of 5. 
cinnamonensis in radio-labelled form, neither 
2 6-deoxymonensin A, nor 3-0-demethylmonensin A, nor 3-0- 
demethyl, 2 6-deoxymonensin A were significantly 
incorporated into monensin A (Ashworth, D.M. et al . J. 
Antibiot. (1989) 42:1088-1099), either because they are 
actively excluded or because these modifications in fact 
occur earlier in the biosynthetic pathway so that these 
metabolites are shunt products not readily converted into 
the final antibiotic by the respective hydroxylase or 
methyltransf erase . Similarly, the putative all (E)-triene 
precursor (1) has been synthesised and shown not to 
become incorporated into monensin when fed to growing 
cells of S. cinnamonensis (Holmes, D.S . et al . Helv. 
Chim. Acta (1990) 73:239-259). An alternative pathway has 
been proposed, as shown in Fig 2B, based on the 
transition-metal-mediated oxidation of 1,5-dienes (Walba, 
D.M. and Edwards, P.D. Tetrahedron Lett. (1980) 21:3531- 
3534) . The triene intermediate (4) would different from 
that of Figure 2A (1) only in that each carbon-carbon 
double bond would have the (Z) -configuration (Townsend, 
C.A. and Basak, A. Tetrahedron (1991) 47:2591-2602) and 
not the (E)- configuration. 

The genetic basis of secondary metabolite 
biosynthesis essentially exists in the genes which code 
for the individual biosynthetic enzymes and in the 



X 
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regulatory elements which control the expression of the 
biosynthetic genes. The genes encoding biosynthesis of 
polyketides in actinomycetes have hitherto been found as 
clusters of adjacent genes, ranging in size from 
5 20 kilobasepairs (kbp) to over 100 kbp. The clusters 

often contain specific regulatory genes and genes 
conferring resistance of the producing strain to its own 
antibiotic . 

In various of its aspects the invention provides the 
10 following :- 

(1) a DNA sequence encoding at least one peptide 
necessary for the biosynthesis of monensin, preferably 
comprising one or more of the following genes: mon BI r 
mon BII, mon CI, mon CII, mon H, mon RI, mon RII, mon T , 

15 mon AIX and mon AX as depicted in the appended sequence 

data or an allele or mutation thereof; 

(2) a DNA sequence according to the first aspect 
comprising all of the genes listed therein or an allele 
or mutation thereof; 

20 (3) a DNA sequence according to the first aspect 

comprising the complete monensin gene cluster; 

(4) a DNA sequence coding for one or more of the 
peptides set out below, said peptide having the amino 
acid sequence as set out in the appended sequence data or 

25 being a variant thereof having the specified activity: 

peptide activity 
mon CII epoxyhydrolase/cyclase 

mon E S-adenosylmethionine-dependent methyl transferase 

mon T monensin resistance gene 

30 mon RII repressor protein 
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mon 


AIX 


thioesterase 




mon 


AI 


polyketide 


synthase 


multienzyme 


mon 


All 


polyketide 


synthase 


multienzyme 


mon 


AIII 


polyketide 


synthase 


multienzyme 


mon 


AIV 


polyketide 


s ynthase 


multienzyme 


mon 


AVI 


polyketide 


synthase 


multienzyme 


mon 


AVII 


polyketide 


synthase 


multienzyme 


mon 


AVIII 


polyketide 


synthase 


multienzyme 


mon 
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regulatory 


protein 
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C T 

l L 


flavin-dependent epoxidase 


mon 


BII 


carbon-carbon double bond isomerase 


mon 


BI 


carbon-carbon double bond isomerase 


mon 
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cytochrome 


P4 50 hydroxylase 


mon 


RI 


activator protein 




mon 


AX 


thioesterase 





(5) a recombinant cloning or expression vector 
comprising a DNA sequence according to any of aspects 1-4; 

(6) a transformant host cell which has been 

20 transformed to contain a DNA sequence according to any of 

aspects 1-4 and is capable of expressing a corresponding 
peptide; 

(7) a hybridization probe comprising a polynucleotide 
which binds specifically to a region of the monensin gene 

25 cluster selected from mon BI , mon BII f mon CI, mon CII , 

mon H, mon RI, mon RII, mon T, mon AIX and mon AX; 

(8) use of a probe according to aspect (7) in a 
method of detecting the presence of a gene cluster which 
governs the synthesis of a polyether, and optionally 



isolating a gene cluster detected thereby; 

(9) Use of a probe comprising a polynucleotide which 
binds specifically to a gene responsible for levels of 
activity of the monensin gene cluster, preferably a 
regulatory gene, resistance gene or thioesterase gene, 
more preferably the regulatory gene mon RI, in a method of 
detecting an analogous gene in a gene cluster of another 
polyketide, preferably a polyether, and optionally 
manipulating the gene detected thereby to alter the level 
of expression of said other polyketide; 

(10) a host cell, preferably Strep tomyces 
cinnamonensis, containing a heterologous gene under the 
control of the mon RI gene and a monensin promoter; 

(11) use of a portion of the monensin gene cluster 
having chain terminating activity, preferably comprising 
at least one of mon AIX and mon AX or a mutant or allele 
thereof having chain terminating activity, to effect chain 
release of a peptide other than one required for monensin 
biosynthesis ; 

(12) use of a portion of the monensin gene cluster 
having carbon-carbon double bond isomerase activity, 
preferably comprising at least one of mon BI and mon BII 
or a mutant or allele thereof having isomerase activity to 
provide a desired stereochemical outcome in the synthesis 
of a polyketide other than monensin; 

(13) a polypeptide encoded by a portion of the 
monensin gene cluster, preferably comprising at least one 
of mon BI and mon BII or a mutant or allele thereof, 
having carbon-carbon double bond isomerase activity; 

(14) an epoxidase enzyme encoded by mon CI or a 



derivative or variant thereof having epoxidase activity; 

(15) a cyclase enzyme encoded by mon CII or a 
derivative or variant thereof having cyclase activity. 

Some embodiments of the invention will now be 
described by way of example with reference to the 
accompanying drawings in which : 

Fig 1 shows the structure of monensins A and B; 

Fig 2 illustrates proposed biosynthetic pathways; 

Fig 3 illustrates the proposed organization of the 
monensin polyketide synthase (PKS) enzyme complex; and 

Fig 4 illustrates the proposed organization of the 
monensin biosynthetic gene cluster. 

The overall gene organization of the monensin 
biosynthetic gene cluster, as shown in Fig 4, is similar 
to that previously found for many macrolide biosynthetic 
gene clusters, which have one or more open reading frames 
(ORFs) encoding large multifunctional PKSs flanked by 
other genes which encode functions required for the 
biosynthesis of the antibiotic. In the case of monensin, 
there is an unusually high number of distinct ORFs 
encoding PKS multi-enzymes (eight in total, labelled monAI 
to monAVIII) but there is again a separate module of 
enzymes for each cycle of polyketide chain extension, 
exactly as found for modular PKSs for macrolide 
biosynthesis (see Fig 3) . Thus there are 12 condensations 
predicted to be required for the production of the carbon 
skeleton of monensin, and in agreement with this there are 
found to be 12 extension modules of PKS enzymes 
distributed among the 8 PKS ORFs. However, as mentioned in 
detail below, the other genes in the monensin cluster 



include genes which have not previously been found in any- 
other gene cluster for the biosynthesis of a complex 
polyketide, and which are not significantly similar to any 
genes in published sequence databases. The cloned DNA for 
these genes is useful to allow the diagnosis that a 
polyketide biosynthetic gene cluster in any actinomycete, 
uncovered previously by conventional hybridization against 
a PKS gene probe from (say) the DEBS or some other 
characterised PKS gene cluster, is one that governs the 
synthesis of a polyether; and these genes are also 
valuable either singly or in combination as specific 
hybridization probes for the specific detection and 
isolation of additional polyether biosynthetic gene 
clusters. Examples of these previously-unknown genes are 
the genes monBI, monBII, monCI and monCII. In addition the 
regulatory genes monH monRI, and monRII and the resistance 
gene monT and the thioesterase genes monAIX and monAX are 
all useful for the detection of analogous genes in other 
polyether clusters which are required for the rational 
manipulation of such genes in order to increase levels of 
the specific product. 

The cloned and sequenced cluster of genes for 
monensin biosynthesis is useful secondly in the 
engineering of mutant strains of S. clnnamonensls and of 
other actinomycetes which are suitable strains for the 
high level production of either natural or novel 
recombinant polyketides. The sequence of the monensin 
cluster disclosed here shows the surprising fact, that the 
gene cluster contains a gene monRI whose gene product has 
an amino acid sequence highly similar to that of actll- 
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orf4, the pathway-specific activator gene which activates 
the actX and other promoters of the actinorhodin 
biosynthetic gene cluster of Streptomyces coelicolor . The 
recognition of this aspect of the natural regulation of a 
5 Type I PKS cluster is important and valuable because 

first, as will be immediately obvious to the person 
skilled in the art, it will be possible to increase the 
yield of monensin by increasing the level of the activate 
MonRI, either by placing the gene monRI under the control 

10 of a powerful promoter or arranging for the presence 

within the cells of one or more additional copies of the 
monRI gene; secondly, it will be possible to use the monR 
gene as a specific hybridisation probe to locate similar 
genes in other complex PKS gene clusters, especially othe 

15 polyether PKS gene clusters but also polyene and macrolid 

gene clusters and all other Type I modular PKS gene 
clusters; even in cases where (as for rapamycin and 
erythromycin) no such gene has been previously found 
within the currently accepted physical limits of the 

20 relevant biosynthetic gene cluster. In such cases the 

monRI gene probe might be expected to uncover the 
activator even if it resides on the chromosome at some 
distance from the main body of the gene cluster; and 
simple experiments would then show whether the 

25 activator (s) so uncovered are involved in regulation of 

the biosynthesis of those particular metabolites; thirdly 
increasing the copy number of the monRI gene or of any of 
the activator genes uncovered will tend to increase the 
yield of a heterologous polyketide by "crosstalk" where 

30 the activator mimics the presence of the normal activator 



for the transcription of the genes for that heterologous 
polyketide synthase. It is clear from recently published 
work (Wietzorrek, A. and Bibb, M. Mol. Microbiol. (1997) 
25:1181-1184) that the ActII-orf4 family of activators 
exert their effects by binding to promoter regions within 
the target gene cluster, so it will be possible to use the 
monRI gene together with monensin promoter regions to 
drive the high-level transcription and translation of 
heterologous genes in Streptomyces cinnamonensls , and 
perhaps in other host strains too; such genes need not be 
PKS genes or even involved in polyketide biosynthesis. 
Monensin promoter regions are found at the 5' end of genes 
or groups of genes in the cluster and their location is 
clear from the sequence analysis disclosed here. Thus a 
useful vector would provide the monensin promoter and the 
ribosome binding site and continue up to the start of the 
open reading frame, after which the monensin ORF naturally 
found there would be replaced by the heterologous gene. 
The relative strength of the monensin promoters can be 
readily determined using any one of a number of known 
promoter probes, i.e. genes whose expression gives rise to 
readily measurable and quantifiable effects, such as Green 
Fluorescent Protein (GFP) ; or beta-galactosidase in the 
presence of a chromogenic substrate. It should be possible 
to mutate randomly the small region of the monensin 
promoters especially likely to interact with the MonRI 
activator (identified by the presence of tandem 
heptanucleotide repeats with a common consensus sequence 
between the various monensin promoters) (Wietzorrek, A. 
and Bibb, M. Mol. Microbiol. (1997) 25:1181-1184), and to 



determine the optimal DNA sequence for the maximal 
activation effect using either S. cinnamonensls 
(preferably - in case there are other unknown factors that 
make the activation function better in this strain than in 
other heterologous systems), or even in another host 
actinomycete strain. If the natural monensin promoters 
were mutated to have this optimal recognition sequence, 
then this would further increase the production of 
monensin. By extension, the use of this modified monensin 
promoter in conjunction with the monRI gene in 
heterologous systems could form the basis of further 
improvements in expression of polyketide synthases or 
other genes, either by appropriate chromosomal alterations 
to introduce the altered promoter and also the monRI gene; 
or by. provision of vectors containing these optimised 
signals linked to specific genes and housed in suitable 
host cells. 

The sequencing of the monensin cluster has uncovered 
another strategy for gene regulation in such Type I 
clusters. The previously-sequenced genes for the rapamycin 
biosynthetic pathway in Streptomyces hygroscopicus 
included a gene of unknown function (rapH) . A closely 
similar gene has now been found in the monensin 
biosynthetic gene cluster (monH) , and it is clear from 
this recurrence (and the comparison of the sequences with 
those of database proteins) that this gene is potentially 
an important DNA-binding sensor gene which acts to 
regulate the transcription of the cluster in concert with 
other regulatory signals. Simple experimentation is needed 
in order to define whether the gene is an activator, in 



which case putting in another copy or increasing its 
transcription will have the potential to increase 
polyketide biosynthesis; or alternatively the rapH gene 
product may be a negative regulator, whereupon deletion of 
this gene may release the biosynthetic pathway from this 
inhibitory effect and increase yields. 

There is a continuing need to develop new methods of 
high-level production of bioactive metabolites and other 
valuable gene products in actinomycetes. Streptomyces 
cinnamonensis is a recognised and very valuable industrial 
strain for the production of very high levels of monensin, 
it is readily transformable with DNA by standard methods 
of conjugation or of protoplast transformation, it is a 
host for numerous known broad range plasmids including 
well-known expression plasmids of both high- and low-copy 
number, it also grows quickly relative to other 
actinomycete strains (for example about three times faster 
than wild type Saccharopolyspora erythraea the 
erythromycin producer, under comparable conditions) and 
sporulates relatively easily. Heterologous polyketides can 
be expressed in Streptomyces cinnamonensis using for 
example the low-copy number plasmid pCJR24 (which has no 
origin of replication active in actinomycetes so is 
maintained by integration into the chromosome) (Rowe, C. 
et al . Gene (1998) 216:215-223) or the related plasmid 
pCJR29 in which the polyketide synthase gene(s) are placed 
under the control of the actJ promoter which is activated 
by the ActII-orf4 activator; or alternatively the monAI 
promoter can be substituted together with the MonRI 
activator; or some other pairing of activator and cognate 



promoter chosen from either a Type II or a Type I 
polyketide synthase gene cluster. As an example, the wild 
type strain of Streptomyces cinnamonensis has been used to 
express the plasmid pCJR29 (Rowe, C. et a 1 . Gene (1998) 
216:215-223) containing as insert the three ORFs for the 
PKS governing the production of 6-deoxyerythronolide B, 
the macrolide precursor of erythromycin A in 
Saccharopolyspora erythraea, these genes being placed 
under the control of the pathway-specific actJ promoter 
from Streptomyces coelicolor together with its cognate 
activator gene act JJ-orf 4. The transformed strain when 
cultivated in a suitable liquid medium produced 6- 
deoxyerythronolide B in good yield. 

It is well known to the person skilled in the art 
that it is possible to use standard vectors unable to 
replicate in actinomycetes to introduce DNA into a 
Streptomyces cell, such DNA comprising two portions of 
contiguous DNA which are each identical to one of two 
portions of the cell's chromosome that are spaced up to 
100 kbp apart; and that through recombination between the 
incoming DNA and the chromosome occurring in both portions 
of DNA the net result is that the chromosomal sequence is 
replaced by the defective sequence originally that of the 
incoming DNA, Such a procedure has been applied to the 
monensin-producing strain of S. cinnamonensis as described 
in detail below, and a strain of S. cinnamonensis has been 
obtained that carries a specific deletion in the monensin 
cluster and which is unable to produce the antibiotic. The 
use of such a strain facilitates the production of 
heterologous polyketides by removal of the background of 



monensin production. 

The multiple uses of portions of the cloned and 
sequenced DNA from the monensin cluster will readily occur 
to the person skilled in the art. A surprising feature of 
the PKS of the monensin cluster is an unusual mechanism of 
polyketide chain initiation. We have found that the 
monensin PKS loading module has three domains, which from 
the amino-terminus of the protein are: a KSq domain, an 
acyltransf erase domain and an ACP domain. We have 
uncovered this organisation in the PKS for the 14-nembered 
macrolide oleandomycin as well as in the monensin PKS, an 
organisation of the loading module previously only found 
for the 16-membered macrolides and in which the KSq domain 
(which looks like a ketosynthase or condensation domain 
except that the active site cysteine residue is 
substituted by a glutamine for which the single letter 
notation is Q) had been previously speculated to have no 
function. It was realised that the acyltransf erase of the 
loading module actually has malonyl-CoA and not acetyl-CoA 
as a substrate and that KSq is an active decarboxylase. It 
appears that a better discrimination can be achieved in 
the selection of the smaller acetate unit over propionate 
if the choice is made initially between methylmalonyl- and 
malonyl-CoA. 

An unprecedented feature of the monensin PKS genes is 
that no integral chain-terminating domain is present as a 
C-terminal appendage of the PKS extension module that 
catalyzes the twelfth and final chain extension. Because 
the product of the monensin PKS is a carboxylic acid, it 
would have been firmly predicted that chain release would 



have been catalyzed by such a C-terminal domain containing 
a "thioesterase" activity. Previously sequenced PKS gene 
sets have been of two sorts: first, those macrolide PKSs 
typified by erythromycin, spiramycin, tylosin, niddamycin 
which have a readily recognisable C-terminal 
"thioesterase" domain, which in these enzymes functions as 
a specific cyclase rather than releasing the polyketide 
product as a free carboxylic acid; secondly, those 
macrolide PKSs typified by rapamycin, FK506, and 
rifamycin, where there is an alternative and recognised 
mode of chain termination by transfer of the polyketide 
chain to an acceptor moiety, catalyzed by a specific 
enzyme (eg pipecolate incorporating enzyme for rapamycin 
(Schwecke T. et al . Proc. Natl. Acad. Sci. USA (1995) 
92:7839-7843) and FK506 (Mothamedi H. and Shafiee A, Eur. 
J. Biochemistry (1998) 256:528-534); arylamine synthetase 
for rifamycin (August P.R. et al. Chemistry & Biology 
(1998) 5: 69-79) . 

The monensin PKS surprisingly falls into neither 
category, and therefore seems to be the first example of a 
novel mode of chain termination. It is novel and 
noteworthy in this connection that the monensin PKS gene 
cluster contains two small genes that encode discrete, 
monofunctional thioesterase enzymes. Although many PKS 
gene clusters have been previously shown to contain one 
such discrete thioesterase, none have been shown to have 
two. The role of such thioes terases is not known, although 
in the case of methymycin/pikromycin PKS, which has been 
reported to be responsible for the biosynthesis of both 
the 12-membered macrolide methymycin and the 14-membered 



macrolide pikromycin (Xue Y.Q. Proc. Natl. Acad. Sci . USA 
(1998) 95:12111-12116) the disruption of this thioesterase 
reportedly caused a ten-fold drop in the amount of both 
macrolides produced. A similar finding has been reported 
for the discrete thioesterase of the tylosin PKS gene 
cluster (Cundliffe E. et al . Chemistry & Biology in 
press) . Additional copies of such thioesterases may 
therefore accelerate the production of specific 
polyketide, but this has not yet been demonstrated. 
However, the presence of the discrete thioesterase is not 
completely essential for polyketide production. 

It is highly desirable to have a broadly effective 
method of catalysing the release of polyketide gene 
products from a PKS as the free acid. The well-studied 
integral thioesterase domain in the erythromycin PKS 
thioesterase has a broad specificity in cyclization to 
form a lactone (assuming that a hydroxy group is present 
in the growing polyketide chain at an appropriate 
position) , but hydrolysis to form the free acid is very 
slow. The recognition of the unusual arrangement of the 
monensin PKS means that it is now possible to harness 
either the entire PKS module that catalyses the twelfth 
and final extension cycle in monensin biosynthesis, or the 
C-terminal portion of it, and graft it onto a different 
polyketide synthase by genetic engineering, so as to allow 
the release mechanism characteristic of monensin to 
operate in a different context. The use of this portion 
only of the monensin PKS suffices to allow the novel 
mechanism of chain release to operate successfully. The 
speed of the polyketide chain hydrolysis in a given case 



can depend on the additional presence of one or both of 
the discrete thioesterase genes (monAIX and monAX) from 
the monensin gene cluster. The use of this novel method of 
chain termination represents a valuable way of generating 
a large number of novel engineered polyketides that are 
currently inaccessible, and ensuring that the products 
have a specified chain length. 

The genes monBI and monBII appear to encode very 
similar enzymes with significant amino acid sequence 
similarity to authentic ketosteroid isomerases which are 
known to catalyse the migration of an activated carbon- 
carbon double bond. The conservation of active site 
residues makes it very likely that these mon genes govern 
a reaction involving activated double bonds in the 
biosynthetic pathway to monensin and this surprising 
observation can be accommodated if the initial product of 
the polyketide chain growth on the monensin PKS is a 
linear precursor in which the double bonds were initially 
formed with a conventional trans or E (entgegen) geometry; 
but before the polyketide chain was extended by insertion 
of the next unit the monBI and/or the monBII gene 
product (s) catalyse the specific rearrangement of the 
newly-created double bond into the cis or Z (zusammen) 
geometry. This new view of the monensin biosynthetic 
pathway allows the deduction that the monBI and monBII 
genes, perhaps in combination with specific portions of 
the monensin modules where they normally exert their 
effects (namely modules 3, 5 and 7) might be used in order 
to achieve the extremely desirable targetted biosynthesis 
of novel polyketides containing double bonds with Z 



geometry at specified point (s) along the chain. Thus for 
example it should be possible to provide for the direct 
biosynthesis of C22-C23 cis or Z double bond in 
avermectins, thus avoiding tedious and expensive chemical 
conversion of an initial fermentation product into this 
important antihelminthic . Only limited experimentation is 
needed to see whether the monBI and/or monBII gene 
products are sufficient or whether the mon PKS at modules 
3, 5 and 7 forms part of the specific docking site(s) for 
the isomerases and therefore must also be used in the 
creation of the hybrid PKS that will insert the cis or Z 
double bond at the desired position. The substrate 
specificity of the isomerases need not be limited to 2,3- 
unsaturated thioesters. The purified enzymes could also be 
used to effect such isomerisations in vitro, depending on 
the position of the equilibrium or whether further enzymes 
are used to achieve the further transformation of the 
product as it is formed (vide infra) . 

The product of the monCI gene is a novel oxidative 
enzyme with some sequence similarity to authentic examples 
of such enzymes in the databases; and with a clearly 
definable role in the monensin biosynthetic pathway, the 
epoxidation of the double bonds at three separate 
positions in the initially-formed acyclic intermediate in 
monensin biosynthesis. This epoxidase could therefore be 
used in conjunction with monBI/monB II gene products to 
effect oxidative reactions on suitable substrates in vitro 
and in vivo. Similarly the monCJJ gene product is a 
putative cyclase that opens the epoxides and causes the 
formation of ether rings in monensin. 
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Any or all of the monBI , monBII , monCI or monCII 
genes may be introduced into a heterologous strain 
containing the gene cluster for another polyether, in 
order to divert the biosynthetic pathway and produce a 
5 polyketide of altered structure. In these experiments the 

analogues of these monB genes could either be present or 
(once located and characterised using the mon genes as 
probes) they may be deleted prior to the introduction of 
the monB and monC genes into that strain. The converse 

10 experiment in which analogues of the monB and monC genes 

from other strains are introduced into S. cinnamor.ensls 
likewise has the potential to produce novel oxidised 
polyketides. Also, the monB and monC genes or their 
analogues may be introduced into a strain that normally 

is produces a macrolide or a polyene or some other complex 

polyketide and expressed there, when they may effect the 
diversion of the growing polyketide chain on a 
heterologous modular PKS towards a new product, which may 
or may not have the structure of a polyether. 

20 

The availability of the monensin gene sequence allows 
the institution of domain swaps to alter the 
acyltransf erase (AT) specificity of a given module, for 
example the ethylmalonyl-CoA specific extender found in 

25 one of the modules of the monensin PKS can be used to 

replace one of the other ATs to generate an ethyl side 
branch at that position in the chain, or the AT can be 
used to substitute in any other (e.g. macrolide) PKS, as 
described in WO 98/01571 and WO 98/01546. Similarly the 

30 alteration of the level of reduction in a module, by 



10 



15 
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manipulation of the reductive enzymes, can be applied to 
the monensin genes and here it will produce, depending on 
which module is affected, either an altered monensin, or a 
species which is only partly cyclised, or a polyether with 
an altered pattern of cyclisation, or even a linear 
polyketide . 

In general the targetted alteration of the pattern of 
substitution of sidechains or reduction level along the 
polyketide chain produced by the monensin PKS will, like 
the disruption or deletion of the oxidative enzymes 
mentioned above, lead to non-polyether polyketide 
products. It should be possible, by introduction of the 
DEBS thioesterase at the C-terminus of one of the later 
modules of the monensin PKS, together with an 
appropriately placed hydroxy group earlier in the chain, 
to produce novel macrolide products from this polyether 
PKS system, or alternatively novel polyenes of defined 
chain length and chosen ring size. 
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Example 1 

Cloning of the monensin A biosvnthetic gene cluster using 
DNA probes derived from the ervthromvcin-producing 
polvketide synthase of SaccharooolvsjDora ervthraea 
5 A genomic library of the monensin A producing strain 

Streptomyces cinnamonensis ATCC 15413 was constructed 
using methods well-known in the art, namely, the 
production of high molecular weight genomic DNA, followed 
by the partial cleavage of this DNA using the frequent- 

10 cutting restriction enzyme 5au3A, fractionation of the 

fragments on a sucrose gradient and selection of fragments 
of average size 35-40 kbp, and the cloning of these 
fragments into the cosmid vector pWE15 (Evans, G . A. et al . 
Gene (1989) 79:9-20) which had been previously digested 

15 with EamHI and treated with shrimp alkaline phosphatase. 

The library was packaged and transfected into Escherichia 
coli XL-1 Blue MR cells. The library was plated out on 
2xTY agar medium (10 g tryptone, 10 g yeast extract, 5 g 
NaCl, 15 g bactoagar per litre containing ampicillin 50 

20 //g/rnl) for cosmid selection and the colonies were allowed 

to grow overnight. The library was then screened by 
hybridisation using as a probe DNA encoding the 
ketosynthase domain of module 1 of the erythromycin- 
producing PKS ( 6-deoxyerythronolide B synthase, DEBS) of 

25 Saccharopolyspora erythraea. The colonies giving a 

positive hybridisation signal in the hybridisation were 
selected and the cosmid DNA from each colony was purified 
and mapped by restriction digestion. The presence of the 
target biosynthetic genes on a cosmid was verified by 

30 sequencing of the ends of the cosmid inserts using the 



commercially available T3 and T7 primers which hybridise 
specifically to the respective ends of each cosmid insert 
(Evans, G.A. et al . Gene (1989) 79:9-20). 
Example 2 

Sequencing of the biosvnthetic gene cluster for monensin A 
from Strevtomvces cinnamonensis 

Three cosmids obtained by screening of the genomic 
library of S. cinnamonensis were used to obtain the entire 
DNA sequence of the monensin biosynthetic gene cluster. 
These cosmids, MO.CN02, MO.CN11 and MO.CN33 between them 
contain the entire DNA sequence of the cluster and the 
adjacent regions of the chromosome. They have been 
deposited in NCIMB, 23 St Machair Drive, Aberdeen AB24 
3RY, UK, under the NCIMB accession numbers 40956 
(MO-CN11) ; 40957 (MO-CN33) and 40958 (MO-CN02) 
respectively . 

The DNA of each cosmid was separately subjected to 
partial digestion with Sau3A and fragments of 
approximately 1.5-2.0 kbp were separated by agarose gel 
electrophoresis. The fragments were then ligated into the 
plasmid vector pUC18 (Messing, 1982), previously digested 
with BamHI and treated with shrimp alkaline phosphatase. 
The library was transformed into E. coli strain XLl-Blue 
MR and plated on 2xTY agar medium containing ampicillin 
(100 pg/ml) to select for plasmid-containing cells. 
Plasmid DNA was purified from individual colonies and 
sequenced using the Sanger dye-terminator procedure on an 
ABI 377 automated sequencer (Sanger, F. Science (1981) 
214:1205-1210). The sequence data obtained from single 
random subclones of a cosmid was assembled into a single 



continuous sequence and edited using GAP 4 . 1 program of the 
STADEN gene analysis package (Staden, R. Molecular 
Biotechnology (1996) 5:233-241) . 

The sequence is set out in the appended sequence 
listing . 

Tables I and II contain data about individual genes 
and gene products. 
Example 3 

Inactivation of the monensin A biosvnthetic gene cluster 

A chromosomal gene disruption experiment was used to 
verify the identity of the cloned polyketide synthase gene 
cluster. Plasmid pMOB6314 is a pUC18 sequencing subclone 
of the presumed monensin A biosynthetic gene cluster 
prepared as described in Example 1, whose inserted DNA 
comprises the DNA sequence from nucleotide 97 63 to 
nucleotide 10108 in SEQ ID 1, and which therefore contains 
a region of DNA wholly internal to orfE, a putative 3-0- 
methyltransf erase . A Hindi I I fragment containing the 
thiostrepton resistance gene tsr from plasmid pIJ702 
(Katz, E. et al . J. Gen. Microbiol. (1983) 129:2703-2714) 
was cloned into the Hindi I I site of plasmid pMOB6314 and 
the ligation mixture was used to transform E. coli cells. 
Transformants bearing the required plasmid pMOAEOl were 
identified by isolation of plasmid DNA and analysis by 
restriction digestion. pMOAEOl. Plasmid pMOAEOl was used 
to transform protoplasts of Streptomyces cinnamonensis as 
described by (Hopwood D.A. et al . (1985)). Since plasmid 
pMOAEOl lacks an origin of replication that is active in 
Streptomyces, growth in the presence of thiostrepton (25 
Ug/ml) in the regeneration medium led to the isolation of 



stable integrants. Isolated putative integrants were 
tested for the presence of integrated pMOAEOl sequences b 
Southern hybridisation. A clone of Streptomyces 
cinnamonensis identified by its restriction pattern in 
Southern hybridisation as bearing pMOAEOl integrated in 
the region of monE of the monensin A biosynthetic gene 
cluster was designated S. cinnamonensis MO-DD01. 

Detection of production of the monensin A related 
metabolites produced by S. cinnamonensis MO-DD01 was 
performed by GC-MS analysis of methanol extracts of the 
entire broth harvested in 72 hours of growth of the 
strain. No significant amounts of monensin A-related 
metabolite production were detectable. 
Example 4 

Overproduction of erythromycin aglvcone in Streotomvces 
cinnamonensis 

S. cinnamonensis is a suitable system for 
overproduction not just of monensin A but also of other 
polyketide metabolites. Established techniques of genetic 
transformation allow fast introduction of foreign 
polyketide producing genes sets into this host. Fast 
growth of 5. cinnamonensis in liquid culture and optimal 
precursor supply favour high yield of polyketide 
metabolites . 

Protoplasts of 5. cinnamonensis were prepared by a 
modified procedure of Hopwood et al . (1985). Plasmid 
pIB061 was transformed into the protoplasts of 5. 
cinnamonensis and stable thiostrepton resistant colonies 
were isolated. Individual colonies were checked for their 



plasmid content and the presence of plasmid pIB061 was 
confirmed by its restriction pattern. S. cinnamonensis 
(pIB061) was inoculated into 250 ml of M-C3 minimal 
production medium containing 10 jLtg/ml of thiostrepton and 
allowed to grow for 72 hours at 30 °C. After this time the 
mycelia were removed by filtering. The broth was extracted 
with two volumes of ethyl acetate and the combined ethyl 
acetate extracts were washed with an equal volume of 
saturated sodium chloride, dried over anhydrous sodium 
sulphate, and the ethyl acetate was removed under reduced 
pressure to give about 200 mg of crude product. The 
product was analysed by LCQ and mass was confirmed to that 
of erythronolide B. 

This example demonstrates the importance of S. 
cinnamonensis for production of high levels of foreign 
polyketide antibiotics . Introduction of the complete 
erythromycin gene cluster or other gene clusters into this 
system are likely to produce high levels of the 
corresponding metabolites. 
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1 


GATCAGCGCG 


GTGGCGTCGT 


CGGCGTCCAG 


CTCGTTCTGC 


GTGGCGGACG 


51 


GCAGCGCGAT 


GTCGGCAGGC 


ACCTCCCAGA 


CCCGGCGGCC 


CGGCACGAAG 


101 


CGGGCCGAGG 


CGCCGCGGCG 


CTGGGCGTAG 


GTGTCCACGC 


GGGCGCGTTC 


151 


GACCTCCTTG 


ACCTGCTTGA 


GGAGGTC CAG 


GTCGATGCCC 


TTCTCGTCGA 


201 


CGACGTAACC 


GGAGGAGTCC 


GAACACGTCA 


CGGCGTTGGC 


GCCCAGGGCG 


251 


GCGAGCTTCT 


GGATGGTGTA 


GATGGCGACG 


TTCCCGGAGC 


CGGACACGAC 


301 


CGCCGTCCGG 


CCTTCGAGGG 


TCTCGCCGCG 


CTCACGCAGC 


ATCGCCGCCG 


351 


CGAAGAGGAC 


GTTGCCGTAG 


CCGGTCGCCT 


CCGGACGGAT 


CAGGGAGCCG 


401 


CCCCAGTTGC 


GGCCCTTGCC 


GGTGAGGACG 


CCCGCCTCCC 


AGCGGTTGGT 


451 


GATGCGCCGG 


TACTGACCGA 


ACAGATAGCC 


GATCTCCCGG 


CCGCCGACGC 


501 


CGATGTCGCC 


CGCGGGCACG 


TCCGTGTGTT 


CGCCGATGTG 


CCGGTACAGC 


551 


TCCGTCATGA 


ACGACTGGCA 


GAAACGCATG 


ACTTCCGCGT 


CGCTGCGGCC 


601 


GCGCGGGTCG 


AAGTCGCTGC 


CGCCCTTGCC 


GCCGCCGATG 


CCGAGGCCCG 


651 


TCAGCGCGTT 


CTTGAAGATC 


TGCTCGAAGC 


CCAGGAACTT 


GATGACGCCG 


701 


AGGTTCACCG 


ACGGGTGGAA 


GCGCAGGCCG 


CCCTTGTACG 


GGCCGAGGGC 


751 


GCTGTTGAAC 


TCCACCCGGA 


AGCCGCGGTT 


GACCCGCACG 


CGACCGTGGT 


801 


CGTCCTGCCA 


CGGCACCCGG 


AAGACGATCT 


GGCGCTCCGG 


TTCGCACAGG 


851 


CGCTCGATCA 


GGCCGGCTTC 


GGCGTACTCG 


GGGCGAGCCG 


CGATGACCGG 


901 


CGCCAGGGTC 


TCGAGGACCT 


CGCGGGCGGC 


CTGGTGGAAC 


TCCGGCTGGG 


951 


CCGGGTTGCG 


GTGTTCGATC 


TCGGTGAGCA 


GCTGGGAGAG 


TGCTGTCTTC 


1001 


TGCGAGAGAG 


CTGTCTTCGT 


GTCGGGTCGC 


GTGGTCAAAG 


GAGCCCTTTC 


1051 


TGGCACGGCC 


GGCGTAGGCG 


CTCGGCGCCG 


TTGCCGTGCG 


CAGGGAGACG 


1101 


CTCGAGCCGC 


AAGTATGACG 


CGCATGTAAA 


CACAGCGACC 


AGCCCCCCGG 


1151 


TCCAGGGAGT 


GACCACCATG 


CGAGACCGGG 


CCACCGGTAG 


GGCCACCGGT 


1201 


CCGGCCTGCG 


GACCCCGTGT 


CACTTCCGGC 


TCGCGGCCAG 


GGGTGCCGCC 


1251 


CGGCGGACCG 


AATCGGCGGA 


GGCGGCCAGC 


AGTGGCATGC 


GGACGGCCGG 


1301 


GCTGGGAATG 


CGGTTCTGGG 


CGTGCAGCAC 


TCCCTTGATC 


ACCGTCGGGT 


1351 


TCGGTTCGGT 


' GAAGAGGGCG 


GCGGAAAGGC 


GGGCGAGGTC 


GGCTCCGAGA 
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14 01 GCGCGGGCGG GTGCGGCGGA GCCGCGTCGC CACAGCGCGA TCATCTCGGC 

14 51 GTAGTCGGCG GTACGCAGAT TGGCCGACGC CACGATTCCG CCGTGGGCGC 

1501 CCGCAGCGAC CAGCGGCGAG AGGACGATGT CGTCACCGCC GAGCACGGCG 

1551 AAGCCGGGCA GGGGCGAGTC GAGCAACTCC ATGGTGGTCG GGTCGATCGA 

16 01 GCCGGTCGCG TGCTTGATGC CGACGACCTC CGGCAGGCGG CCGAGTGCGG 

16 51 TGATCGTGCC CGCGCCGAGC GTCTGCCCGG TGCGGTAAGG GATGTCGTAC 

17 01 ACGACCAGGG GGAGGCCGCC GTGCTCGGCC AGCGCGGCGA AATGAGCCAG 

17 51 GGTCCCCGCT TCCCCGGGGC GGATGTAGGG CGGCGCGGGG ACCAGCGCGG 

18 01 CGGCGACGTC ACCCCGGGCC GCCAGCTCTC GCAGGGCCGT GATGGCGGTG 
1851 GCGGTGTCGT TGGTGCCCAC CCCGACGATG AGCGGTGCCC CGTGTGCCCG 
1901 GCACGCGGCC GAGCAGACGC GGATCACCGT CTCTCTCTCC TCGGCGGTCA 
1951 GTGTGGCGGC CTCGGCGGTC GTACCGAGGG CGACGAGCCC GGAGGCGCCG 
2 001 GCCGACAGCG CCTCGTCGGC GAGTCGGGCC AGCGCCTCGG GGGCCAGGCG 
2 051 CAGATCGTCG GTGAACGGAG TTACCAGGGG GACGTACAGG CCGTTGAAGA 
2101 GCGGTTCGGT GGTCGGTTCG AGGCTCGATG CGAGGGTCAT GCTCTTACCC 
2151 TGGCCCACGC CACTCGGTAG ATC CATTTCA GATTCCTGCC GTCACAC CTA 
2201 AGCTGAACTT ATGCTCGATG TCCGTCGCCT CCATCTGCTC CGCGAACTCG 
2251 ACCGGCGGGG CACCATCGCC GCCGTGGCCG AAGCGCTGAC CTTCACCGCG 
23 01 TCCGCCGTCT CCCAGCAGCT CGGCGTGCTG GAGAGGGAGG CGGGCGTGCC 
2351 GCTGTTGGAA CGCAGCGGCA GGCGCGTGGT CCTCACGCCC GCAGGACGCT 
2 4 01 CCCTCGTCGC ACACGCCGAC GCGGTGCTGA AC CGTCTCGA ACAGGCGGTC 
2451 GCCGAGCTGG CGGGCGCACG GGACGGCATC GGCGGGCCGC TGCGCATCGG 
2501 GACGTTCCCT TCCGGCGGCC ACACCATCGT CCCCGGCGCG CTGGCCGAAC 
2 551 TGGCCTCTCG TCACCCCGCG TTGGAGCCGA TGGTGCGGGA GATCGACTCC 
2601 GCGCGCGTCT CCGACGGTCT GCGGGCCGGT GAGCTGGACG TGGCCCTCGT 
2651 ACACGACTAC GACTTCGTAC CCGCGACGCC GGACACGACC GTGGACGAGG 
2701 TGCCTCTGCT CGAAGAGCCG ATGTAC CTCG TCACCCATGC CGCGGACACT 
2751 GCCACGGACT CCGGCTCCGG GAGCACACTG GCAGCGCTGC TCGGGCCCTG 



2801 

2851 

2901 

2951 

3001 

3051 

3101 

3151 

3201 

3251 

3301 

3351 

3401 

3451 

3501 

3551 

3601 

3651 

3701 

3751 

3801 

3851 

3901 

3951 

4001 

4051 

4101 

4151 



TGCCGAGGTT 
TGGCTGTACG 
CAGGTCAACG 
GGCCGGGTTC 
TGCTCACGAA 
GGCGGCGGTG 
GGCGGTCGAA 
GAACCGGCCG 
CCTGGTGACG 
G CGATTTTC C 
GGCACCCCCG 
ACGGAAACCC 
GTCCTCGGAA 
CGAAGGACGC 
AAGGTCGAGC 
ACTGCTCGCG 
GGGCCCTGAC 
CTCGACCCGG 
CGGACAC CAG 
GTGTTCGCCG 
GGTGGTGCCG 
CCCAGGCGGG 
ACGAGGTTGT 
CCAGGCGGCG 
AGAAGGTGGT 
GCGGCAGCCG 
GACGTAACGC 
GGACCTGCTC 



CCGTGGATCA 
CGCCTGTCAG 
ACTTCCGCAC 
GTGCCGCGGA 
GCTGCCGCTG 
CCCATCCGGC 
CGCATGGCGG 
ACCGTGGGAA 
TCCTGGCGAC 
GCGATGGCCG 
ACTGGGAGCG 
CTGCGGTGCA 
CAGGTCAGGA 
CGAGCGGCTG 
AGGAGTGCGC 
TTCCGGGAGA 
CGCGGGCATC 
AGGAGATCGG 
GATGGCGACC 
CCACGGTCCG 
AACGTCGTCA 
ACTCCGCTCC 
CGGCCTGGAA 
ACG AC CAGGG 
GCTGAGGTGG 
CGATGAAGAG 
TGCAGGAATG 
CGGAGCGACG 
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CGGCGCGGGA 
GCCGCCGGGT 
GGTGCTGGCT 
TGGCCGCCGA 
TTCCGTCGCT 
GATCGCCGCT 
GTTCACGAGG 
TGTGTGTGCC 
GTCCTGACGT 
ATGACGCGTA 
GCCCTCGCCG 
CGCCTGGTTG 
TTCTGCCCGC 
CCGGTGCCGT 
GCCCCAGACC 
CGAC CCAGGA 
CCCGCGCAGC 
CCGCCTGTAG 
GTGACGGTGT 
TCGCATGTCG 
TCGCGGCCAG 
CCGGGGAATT 
GGTGACGGCG 
CGAGCGCCAC 
CCGGGAAGCC 
CGAACCCCCG 
TGCCGCGGGC 
CTCACGAAGA 



CGGCACGACC 
TCCAGCCCAG 
CTGGTCGCCG 
GCCGAGCCCC 
CGAAGGTCGC 
TTCGTGGCCG 
CCCGGCCGGC 
CTGGGCCGCA 
CCTGATGTCC 
CCTGTTCCTC 
CCGTCGGTGC 
CAGGCTCATG 
CGATGCCGAG 
TGAGCGAGGA 
GTCACGGACA 
CTGGCAGGCC 
GCATCGCCCG 
GCGCTAGCGG 
TGAAGACGAA 
CGTGAGGTGA 
GGCGAAATAG 
CCAGTGCCCG 
AAGGC CACG A 
CAGGGTGCGG 
ACAGCACCGC 
GGGCCGGGCG 
TTCGCGCCGC 
CGGTCATGGT 



GGTCACGCGA 
GATCCGCCAC 
CCGGG CAGGG 
GCGGGCGTGG 
GTTCCGTGCG 
CGGCGACGAC 
GGCTCTGAGT 
CCATTCGTGG 
GAACGAGAAG 
CTCCCCGACC 
CTTGGAATGC 
AGGCCTCCGT 
ACACTCATCC 
GGAGGCGCTC 
TGGAGAGCGA 
CTCGTGCACC 
GCTGACCGGA 
CCGCCCAGTG 
GGCGATGACC 
CGTCGACATC 
ACGTAGTCGG 
CTCGTTCTCC 
CCACGCAGAT 
GGGAGCGCGG 
CACCACCAGC 
CGGTTCCGAG 
GCCCAGGAGC 
GATGGCGAGG 
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42 01 TAGGGCAGCA GGTAGGCGAA GAAGACGAGC ACGCCGACAT CCGCTGCCGA 

42 51 AATCCGCACC ACGGCGTCGA TGGGGAGGAC CACTGCCGCG CACGCCGCGA 

4 3 01 CGGCCAGGCT CACCGCCGAC CGGCGCCGTT CGGAAAGCCA GCGATGCACG 

4351 GACGAGCCTC TCTGGTCGGG CGTCGGGCCT CGTGTGATCG TGACCGGCTC 

44 01 CGCGCCCGCC GAAAGCGCGG TGCGATCTCC TGCCCTCGAA CGAGCGAAAC 

4451 GCTTGCGCCG GAAAGCCTCC CTGCTGATGC CGACGGCGGC GGCAGTGGCT 

4501 GCGGATGCGG ATCGTGCGCT GTGCCCTGAC CCTGGATGGG GGGAGGAACG 

4 551 CAGAGAGGCA GGTGCGCCCA TGACGGTCAT GGACAAGCTC AAGCAGATGC 

4 6 01 TCAAGGGGCA CGAGGACAAG GCCGGCCAGG GAATCGACAA GGCGGGCGAC 

4651 TTCGTCGACG GGAAGACGCA GGGCAAGTAC AGCGGTCAAG TCGACACGGC 

4 701 CCAGGACAAG CTCCGGGACC AGTTCGGCTC GGATCAGCAG GAGCCTCCGC 

4 751 AGAGGTAGGC AGCGTCAGGG CGGAATCGGT CCGGGCGACC GCTGACCGCT 

48 01 GATGCAGATG CCGCAGACGT CGGCCCCGCA CTCCTCCGGG TAAATCGGAG 

4 851 CGTAGGCGGG GCCGACGTGT GCGCGTGCGG CCTCGTCTCT GCCGCCCCTC 

4 9 01 TCCGCCCCGT CTCTGGCCCC TTGGTGCCAG TCTGACGGGA AAATGGCACC 

4 951 ACTTGGTGCC ACGCATGTGC CATGATGGCG TCATCGAGAG CGCGCTGCCC 

5001 CGACTCGCGG GCAGGAAGGG CGCGTTCCGC GGAGTCGGCC GTCGGAGGGG 

5051 TTGCATCATG GGGACAG CAC AGAGCCAGGA GCAGGCCGCC GCGCCCGGTG 

5101 CCTGCGCCGC CTTCGTCCGC TTCGTGCTCT GCGGTGGCGG AGTGGGCCTC 

5151 GCCTCCAGCT TCGCCGTGGT CGCCCTCGCC TCCTGGGTTC CCTGGGCGCT 

5201 GGCCAACGCC CTGGTCGCCG TGGTCTCCAC CGTCGTCGCC ACCGAGCTCC 

5251 ACGCCCGCTT CACCTTCGGT GCGGGCGGGC GCGCGACCTG GCGGCAGCAC 

53 01 GCGCAGTCGG CCGGGTCCGC GGCGGCCGCG TACGCGGTGA CCTGCGTGGC 

53 51 GATGTTCGTC CTGCAGCAGC TGGTGGCGGC GCCCGGCGCG GTGCTCGAGC 

54 01 AGGTCGTGTA CCTGTCGGCC TCCGCGCTCG CCGGTGTCGC GCGGTTCGTG 
54 51 GTGCTGCGCC TCGTCGTCTT CGCCCGGAAC CGCTCGCTGC CCGCCGCGGC 
5501 CGCCGTGCGC ACCGCGCGTC CCGTGCGTCG CGTGCCGGCG CCCGTGCCCG 
5551 CGACCGTGGC CCACGCCGCA TCGCGCCCGG CCGGCCCCGC GGCGCTCTGC 
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56 01 CCCGCCGCAT GACTCCGTGC CCGCATGTTT GTGCCCCCGG TGCTCCGTGC 
5651 GTCCGGGGGC GGGGTGGGCG TCGTGCCCGG GTGGTCCAGG GGTCACGCGG 

57 01 TGGTGTGTGC CAGTTCCTGG CCGAGGTGGT GGGCGAGCTG TGCGGGCGTG 
5751 GGGTTCTCGA CGATGGCGAC CATCGCGATC TCCATCCCGG TCAGCGTCAT 

58 01 CAGTGTCTTG GTGAGCTCAA GGGCGGTGAG GGAGTTGAGA CCGTTCTCGA 
58 51 GGAAGTTGCT GTCGTCGCTG AGGGTGGTGT TCAGAAGGGT GCCGGCCTGG 
5901 GTG CGGATGG TGTCGGTGAG GAGCTTCTCG CGCTCCTCGG GGGTGGCCGC 
5951 GGCGAGCTGC TTCTCCAGCT CGGTGGCGTC CTGGCCGGAG GTGTGGTCGG 
6001 TGCTGGTCAT GACTGCTCCT GTGTGAGTGA GGTGTTGGCG GGGGTCACAC 
6051 CGCGGCGTGC GCGGTGTGGT CGTGCAGCCA GTAACGCGTG GCCTGGAAGG 
6101 AGTACGTCGG GAGGTCGATG GTCCGGGGGT GGGGGGTGCG CCGGACGAGA 
6151 GGGGTCCAGT CGACGGTGCC GCCCGTGGTG TGAAGCCGCG CGAGGGCGGT 
6201 CAACAGGGCG CTTACGGCGG AGGTTTGCGT GCCTTCCGGT GAGAGCGCGC 
6251 CCAGGTGGAG AAGCGTGTGG GTCTCGGGGG TGGGGGGTGC GGTGGGGGCC 
6 301 GGCGAGGTGA GGTGGTGGTG CCAGTAGTCG GCGGAGGCGA TGGGGGTGTC 
6351 GGCCGGGGCA GTGCTGGTGA GCGTGAGCGT GGCGCGTTGG AACGTCAGCT 
6401 GCTTCAGCAC GGGCTCGTAG GCGTCGGGCG GAGCCGGTTG TTCACCCTCG 
6451 GCGGCCTGGG CGGCAGCGGC GTGGGCGGCG GCCAGGCGGC ACGCGTCGTC 
6501 GAGGGTCAGG ATTCCCGCGG CGTACGCGGC GGCGATGTGG CCGACACCGT 
6551 CGCCGGTGAG GGTGTGGGGG CGTACCCCCG TTTCCAGGAG CAGCCGCGCG 
6601 AGCGCGGTGT GGACCGCGAA GCGCG CCAGT TCGGAGTGGG GAGTGGGGAG 
6651 GGGAGTCGGC AGATGGGTGT CGAGGAGCGC GCGCGCTTCG TCGAAGGCGG 

67 01 ACGCGAAGAG CGGGAACGCC GAGTGGAACT CGGCACCTCC GAAAGCCGCG 
6751 CCGAATGTAG CGCCGAATGT CGCGCCGGGT TTGGCTCCGG GTGCCGCCCC 

68 01 CGTCGTCACC CCGTCGGCCG GGCGGCCGTC GAAGTGCCAG GCGATCTTCT 
6851 TCGGGCCGGC CCCGGGCGTG GACCTGACCA GGTCCGGGTG GTCCTCTCCG 
6901 GCGGCCAGGG CGCGGGCGGC GGCGAGGAGT TCGGTGTGGT CGGTGCCGGT 
6951 GAGGACGGCG CGGTGTTCCA GGGGGCTGCG GGTGGCGGCG AG CG AGTAGG 
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7001 CGACCTCGGC GGGGGAGGGC GCGGGGTCGG TGGCCGCCAG GTGGGTGACG 

7 051 AGGGCCTTCG CCTGTGCCCG CAGGGCCTCG GGTGTACGAG CGGACAGGCT 

7101 CCAGGCCACC GGGAGTTCCG GGGCAACGGG CGACGTCTGG TCGCGGGCGG 

7151 CATCCGGCAC CGGAGCCTCG TCCACCGGCG GCTCTTCGAG GATGAGGTGC 

72 01 GCGTTCGTGC CGGACGTGGC GAAGGCGGAG ATGCCGACCC GGCGGGGCTC 

72 51 CTCGCGGCGG GGC CAGTCGA CCGCCTCGGT GAGCAGCCGT ACCGCGCCCT 

73 01 TCTTCCAGGC GGCGAGGGGC GTCGGGCGGT CGACGTGGAG GGTCGGCGGC 
7351 AGGGTGCCGT GCCGGAACGC CTGGACCATC TTGATGAGCG CGGCCGCACC 

74 01 CGCGGCCCCC TGCGTGTGCC CCGTGTTGGA CTTGACGGAG CCGAGCCACA 
74 51 GGGGCCGGTC GGGGGAGCGG TCGGCGCCGT AGGTGGCGAG GAGGGCCTGG 
7501 ACCTCGATGG CGTCGCCGAT GGGGGTGCCC GTCCCGTGCG CCTCGACGGC 
7551 GTCGATCTGG TCCGGGGTGA GCCCGGCGTC GGCGAGGGCG GCGCGGATCA 

76 01 CATGCTGCTG GGAGGGGCCG TTGGGGGCGG CGAGG CCGTA TCCGGCGCCG 
7651 TCCTGGTTGA CCGCGGAGCC GCGGATGACG GCGAGCACCG GGTGGCCGTT 

77 01 CTTCCTGGCG TCGCCGAGCC GCTCAAGCAG GACGAGGCCG ACGCCTTCAC 
7751 CGAGGCCCAT GCCGTCGGCC GCGGCGGCGA ACGGTTTGCA ACGGCCGTCC 
7801 TGCGCGAGCG ACTTCTGGTG GGCGAAGGCG TGGAAGGTGT GCGGCGTCGA 
7851 CATGACGGTG CCGCCGCCGG CGAGGGCGAG GCCGCACTCC CCGCGGCGCA 
7901 GCGCCTGGCA GGCCAGGTGG AGGGCGACCA GGGAGGACGA GCAGGCCGTG 
7951 TCCACGCTGA TGGCGGGGCC CTCGAGGCCG AGGGCGTAGG CGATGCGGCC 
8001 GGAGACGAGG CTGCCGGACG TGCCGCCGCC CAGATAGGGC AG CAG CTCGT 
8051 CGGGCGCGGT CTCGAGCCGT GTCGCGTAGT CGTGCCCGGT GGCGCCGACG 
8101 TAGACGCCGG TGAGGGTGGA GCGCAGGGTG TGCGGGGCGA TGTGGCCGCG 
8151 TTCGACGGTC TCCCACGCGA GGTGGAG CAT GAGGCGCTGG AGGGGTTCGG 
8201 TGGCCACGGC CTCGGTGTCG CTGATGTCGA AGAAGCCCGC GTCGAAGCCG 
8251 GCCGCGTCGT CCAGGAACCC GCCGAGCTCC GCGTACGGGC GTTCCTCGGG 
83 01 GAGTTC CCAG GCGCGGTCGT CGGGGAAGCC GGTGACGGCG TCGCGGCCCT 
8351 CGGACAC CAG ATCCCACAGG TCGTCCGGGG TGCGGGTCTT GCCGGGCAGC 
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84 01 CGGCAGGCCA TGGAGACGAC GGCGATCGGC TCGTGCTGTG CGGCCTTCAG 

84 51 TTCGCGCAGC TGCTGCTGGG CCTGGTGGAG CTCGGCCGTC GTCCACTTGA 

8501 GGTATTCGAC GAGCTTCTCT TCGTTCGCCA CGGGAATGGT CAGCCTTCCT 

8 551 GTTCTCGCGC GTGAAGCCTC AGGTGGGACG AGGTCGGGCA AGGTGGGCAG 

8601 GCAGGAGCCG CGCGCTGTGG GTGCCAGGGT CGCCGCGGCT GCTTAAGCGG 

8651 GTCTAACTCC CGCCTTGCCG CCGGGCATCG CCTCGCACGA GCGGGCCAGC 

8701 AGCAGGAGGT CGGCGGCGAT CTCGTCGGGT GCGCCGGCGT GCAGATCGTG 

8751 GTCGGAGCCC GGGTACCAGC GCACGCTCAC CTGCTCCAGG GCCGCCTCGG 

88 01 CGGCGGCCAC CCAGGCCCGT ACCTGGTCGG ACAGTTGGGG GATGG CGGGG 

88 51 ATGAGGGGCA GCAGCCGCAC CGGCACGGTG ACCTTGGGAT ACCAGTCGGC 

8 901 CGGTGCCTCC CGTTGCAGGC CGGCGACGAT CGACATGACC TGTGTCGAGG 

8 951 TCAGGCGGGG GATGAGCAGG CCGTCCGGCC CGACGCGGTA GTCCGCCAGG 

9001 CGTGCCTCGA TGGACGTGGG CGACCAGTCG GGATGGGTGG CCCGCAGGTA 

9051 GGCGCGCATG TCGGCGGCGC TGGTGGTGCC CTGCTGGGCG CGCCGGACCA 

9101 CGTCGGCGGT GCGCTCCCAG AAGGCGCGCA TCACCGGTCC GTCGAACTCG 

9151 TACCAGCCGC CGTCGATCAG GGCGAGACCG GCCACCAGGT CCGGGTGCTC 

92 01 GGCCGCCAGG CGCAGCGCGA GGTGCGCGCC CCAGGAGTGC CCGGCCACCA 
9251 GTGCGCCGGA CAGGTCGAGG GCGGTGACGG CCGCCACCAG GTCGGTGACG 

93 01 ACCGTCGCGT TGTCGTACCC GTCGGGCGGG GTGTCCGACT CGCCGTGGCC 

93 51 GCGGTGGTCG ACGGCGTAGG CCGGGTGTCC GGCGGCGGCG AGACGGGCGG 

94 01 CGACCTCGTC CCACATCCGG GCGTTCGACA GCATGCCGTG CAGCAGCAGG 
94 51 AACGGACGGC CCGGGGCTCC CGGCCCGTCC GCGGGCCGGT ACCTGACATT 
9501 GAGGGAGACG GTCTGCGACA CGGGGATGCG GAGGTTCTTC ACAGGCGGGC 
9551 CCTTGTGATC CCTTGTGCTG GGGGAGGAAA GCGGGGGCGG CACGCTCAGG 

96 01 GGCGCTGCGC GGTCGCGAAG ATGTATC CGA GCTCGGGCAT CTTGC CGAGG 
9651 GCCGCCTGGT TGTGCAGGAA CAGCTCGTAT CCCTCTACGC CGATGATGTC 

97 01 GACGTACTCG TCCCGGTGGG CGCGGATCCA CTCGACGTAA CCGTCGTAGG 
9751 TCTTGGCGGT CTCGCGGGTG ATGTCGGTCA GTTCGAGGAC GGTCCAGCCG 



9801 
9851 
9901 
9951 
10001 
10051 
10101 
10151 
10201 
10251 
10301 
10351 
10401 
10451 
10501 
10551 
10601 
10651 
10701 
10751 
10801 
10851 
10901 
10951 
11001 
11051 
11101 
11151 



GCGGCGCGGA 

GATCGTGGTG 

TGAGGTAGAC 

CGGTGGGCCT 

CTCCAGGGCC 

TGGCGTCGAC 

GCCCGGCGGT 

CACCTCGACG 

CGCAGCCGAT 

ATCATCTCGT 

CTCGCCGCCG 

GAGTCAT CAG 

GGCTCGGGGG 

GTTCTTCGTC 

CCACCCCCGC 

GGAGCTGCTC 

AAAATGAGAG 

CTCCCATGTC 

CTCGTACTCG 

GGCGTTGCCC 

AGTGGTTCGT 

GCCGGCATGC 

CCTGGTGATC 

CCGGCACCTT 

ATCATGCCGA 

GCGGCCGAAG 

CGCTCGGCCC 

TCCGTCTTCC 



AGATGTCGGG 
TCGCTGACGG 
CATGTCGGCG 
CGGTGAGCAC 
CAG C AGTGGT 
CTGCTCGAAG 
TGCCGCGCTC 
TCGCGGGCGC 
GTCGAGGACG 
CGGTCATCTG 
TCGAACCAGT 
GTCGAAGACC 
CGACGGTCTT 
ACGGCTTCAG 
CCCTCAAAAG 
TTGACGCGTT 
TACGCTCCCA 
TGCTGATCTG 
CCTCGATGGT 
GCCATGGCCG 
GACGTCGTAC 
TCGGTGACCG 
TTCGGTATCG 
CATCGGCGCG 
CGACGCTGTC 
GCCATCGGAG 
GATCCTCGGC 
TGATCAACGT 
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GTAGTCCCCG 
TGGCGGTCCG 
ATCGGCATCC 
CTGCTGCTTG 
CGAAGGAGCC 
CGGACCCGGT 
GACCTGGCGG 
GGGCCAGCTG 
CGGTCGCCGG 
GACCATGGCC 
AGCCGTAGTG 
TTGTGGTCGT 
GTTCACCGTC 
CGTCACCGTG 
CCCCTATGGG 
CCATACGGAA 
CTAGATATTG 
GGTGCGCGGC 
CGTGGGCTTC 
ACGACCTCGG 
ACGCTGGTGT 
GTTCGGACGC 
CCTCGCTGGC 
CGTGCGGTGC 
GCTGCTGCCG 
CGGTGGCCGG 
GGCTACCTGC 
GCCGGTGGTG 



ATGTCGGTGA 
GGGCCGGCTG 
GCGCGCCGGG 
TCCGGCATGT 
GTCGTCGAAC 
CGGCGAGGCC 
GCGCTGACGG 
CATGGCCGGG 
GGGCCGGGTC 
TCGTCGAACG 
CAGATTGCCG 
AGTAGTGGCC 
GGGGGCTTCT 
CGGCGGCAGG 
CCCTCCTCGA 
CGGGTGGTAC 
AGCTCTCTTT 
GGTGGTGGGC 
GATGTGACGA 
CGCGAACAAC 
TCGCGGCCGG 
AAGAAGGTCC 
CTGTGCCTAC 
TCGGTCTGGG 
GTCATGTTCT 
TGCGGCGATG 
TCAACCACTT 
ATCCTCGCCT 



GCGCGGCGTA 
GGATCGGGGT 
CTTCACGACG 
GCAGCATGGA 
GGCAGGTTCA 
GGCCTCGCGC 
AGATGC CGAC 
GTGCCGTTGC 
GAGGCGGCGG 
TGGCCTGCTG 
TCTCCGAGCT 
GATGTCGCTG 
TGGTCGTCGC 
CGCCACAACC 
CCGCCCCTAG 
CCCTCCGAAA 
AGGAGGTCGA 
CGTCGGTGCT 
TCCTGAGCCT 
GTCGAGCTGC 
CATGATCCCG 
TGCTCACCGC 
GCGACGTCCT 
CGCCGCGCTG 
CCGACGAGGA 
CTCGCCTATC 
CTGGTGGGGC 
TCCTCGCGGT 



11201 

11251 

11301 

11351 

11401 

11451 

11501 

11551 

11601 

11651 

11701 

11751 

11801 

11851 

11901 

11951 

12001 

12051 

12101 

12151 

12201 

12251 

12301 

12351 

12401 

12451 

12501 

12551 



CTCCGCCTGG 
TCGGCGGCCT 
GTGATCCAGG 
GTGCATCGGC 
GGGTGGCGGA 
ACCTCCGGCA 
GCTCTTCACG 
TGGGCAGCGG 
GTGACGGTCG 
CGGCATCGGC 
CGGACGTCAG 
GGACTCGGCC 
CGCGCTCTCC 
TCCGTACCCT 
AACTCCGGCT 
ACACGGCGCG 
CGATCAAGTC 
GCCCTGGACG 
GGTGCTGGCG 
AGACAGCAGA 
AACAGTGCCT 
TTC AG CGGGA 
ACCATCGAGC 
CCGCTACTTC 
TGCCCTTCGC 
CAGGCCGAGC 
GGAACTGGCC 
AGCTCTGGGG 



CTGCCCGAGT 
GGTGTTCTCC 
GCGGCGAGAA 
GGTCTGCTCG 
CCCGCTGGTC 
CCATGCTCGG 
ATGCCGCAGT 
CTTCCGGCTG 
CCAACAAGGT 
TTCGCCCTCC 
CAGCGGCACC 
TCGGCATCGC 
GAGGACTCCG 
CGGCGGCAGC 
ACCGCGGCAA 
GTCAAGGACT 
CAACGGACTG 
TGGTGCTCGT 
GTGGTGTGGC 
ATCTGAGCAT 
GGTCTGAGAG 
GGCGGTGCGC 
AGATCGCCGA 
GCGACCAAGC 
GATGATGGTC 
GGCAGGCCAT 
CTGCAGCGCG 
CGCCAGCCTC 
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CCAAGGCCAA 
AGCGTCGGTC 
GGGCTGGACG 
CCCTCGTGCT 
GACCTCTCGC 
CACCGTCATC 
ACTACCAGGC 
CTCCCGATGG 
CGCCAAGGCC 
TCGCCGCCGC 
GGCCTGGCGG 
CCTGCCGACC 
CCGGCGTCGG 
TTCGGCGCGG 
GCTCGACCTC 
CCGTCTTCGG 
GCCGACTCGG 
GGTCTCCGGC 
TGCCCCGCCA 
GAAGCCGCAG 
AACGCAAGAA 
TTGTTCAGGG 
AGCCGCCGAG 
AGGACCTGGT 
CAGGCCCAGT 
CCGCTCGATG 
AGCGGTTCGT 
GGCAACATCG 



GGAGGCCAAG 
TCGCCGCGCT 
GACGTCACCA 
GTTCGTGATG 
TGTTCCGCTC 
AACTTCACGA 
GGTCCTCGGC 
TCGGCGGTCT 
CTCGGCCCGA 
CCTGTTCTAC 
CCGCCTGGAC 
G C C ATGGACG 
ATCCGGCGTC 
CCATCCTCGG 
GACGGCGTGC 
CGGCCTCGCG 
TGCGTTCCGC 
GGCCTCGGAC 
TGTTGGTCAG 
ACGCAGTCTG 
GGCCCGGACG 
AACAGGGCTA 
GTCGCTCCCA 
CTTCTCGCAC 
CACCCGACCT 
TTGCAGGACA 
CCTGATTCTC 
GCCAGACCAT 



CCGTTCGACA 
G AC CTACGG C 
CGCTGGTGCC 
TGGGAGAAGC 
GGCCCGGTTC 
TGTTCGGCGT 
ACCGACGCGA 
GCTCGTGGGT 
AGACCGCGGT 
GGCGCCACCA 
CGCGGCCTAC 
CCGCCCTCGG 
AACCAGTCCA 
TTCCATCCTC 
CCGAGCAGGC 
GTGGCCCGGG 
GTACGTCCAC 
TGCTGGGTGT 
AGCACCGCCA 
ACCAGGGCAA 
AAGGCCGCGA 
CACCGCCACG 
GCACCGTCTT 
GACTACGATC 
GACGCCGATC 
TCAGCGAGCA 
TCCGAGCCGG 
GCAGATCATG 



12601 AGTGAGCAGG TGGCCAAACG 

126 51 CCGCGCCTAC ACCGGAGCCG 

127 01 ACTGGGCCAA CGATCCGGAC 
12 751 CTCCACTACC TGGAAGACCT 

128 01 CACAGAGCCC GCCCCGGCCA 
128 51 GTACGACCCC CGCGCCCCGG 
12901 AAGAGACGAC CGCACGCGGC 

12 951 AACCTGGCGT GCTCCGGCTT 

13 001 GGCCCTCTCC CGGCGGCTCC 
13 051 TCGGTCACGA CGGCCGCCAC 
13101 CCCGCCGGGC AGCACCCGCA 
13151 ACGTGGCGGC CTGCTCCGGC 

132 01 GCCGTGATCG GGCAGTCGAG 
13251 CACGGCCCGG TAGTCGGCCC 

133 01 CGGGACTGCG GAAGAACCGC 
13351 GCCAGGATGT CCGCGTCCCC 

134 01 GGGCCGCGCG AGCCCCCCGG 
13451 CGGCCGGTCC CCGCAGCCGC 
13501 AGGCTGTGCC CGAACAGCGC 
13 551 CACGACGCCG TCGGCGAGCT 
13601 GACGGTCCTG CCGCCCCGGA 
13651 GCGAGCAGCC CGGAGAGCCC 
13 701 CGGAAAGCAG ACCAGCCGCA 
13751 GCAACCACAC CCCGTTTCCG 
138 01 GGTGCCCGCG CCGCCGTGCC 
13851 GCTCCGCGGG CGCTGTCCTG 
13 901 CCCCGATGCT GGCCAAACCC 
13 951 GGCCCATAGC GCCCGGCTAA 
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GGCCGGGCGC GACCCGCGGG ACCCCGCGGT 
TGTTCGGAGT GATGCTCCAG GTCTCGATGG 
ATGGACTTCG CGACCACGCT GGACGAGGCA 
GCGGCCCTGA CCGAAGGGGC GGGCGCACAC 
GACGTCGTAC GAGGCGCCAT CGGCCGTCGC 
ATTCCCCCGC GGGGCGCGGG GTCAAGGGAA 
CACTGTTCCC CCGGCTGCCG CGTCCGGTCC 
CCCTCGACGG AGCACGCCAG GGGTCTGTCC 
CGTCAGACGC CCGGCCCCGC CGTCAGCGCC 
CTGCTCCTGA CAGCCGTCGA GGTAGAAGTG 
GATCGAACGC CGCGCCGGTC CGCTCCCGCC 
GACGTCCGCT CGTCGGCGTC CCCGATCAGC 
CCGGCCGGGT CCCGGCGCCT CGTAGGTGGC 
GCAGCGCGGG CAGGACGAGC TCCTGCAGCT 
TCGTCGGTGC CGCCCATCGC CCGCAGATGG 
GAACGCCCCC GAACGCCCCG CGGGACGGTA 
AAACGAACAG GTGCACGGGA AGGCCGGGCC 
CGCGCCACCT CGAACGCCAC GATGGCGCCC 
GAATGGCTTC CCGTCGCACG GCAGGTGGGG 
CGGCCACCGA CGCCAGGCAC GGCTCCGCAT 
TACTGCACGG CGAGCACCTC GACGCCGGGC 
GAAGTAGTAA CTCGCCGAAC CGCCCGCGAA 
CCGGCGCCTC TGCCGCAGCG TGGTACCGCC 
GTGGCTGCAC CGAACTCGTC ACCGATCTGT 
CCTGTCCATC GTTCTCCCTC TCCTCGCGTC 
CCCCGCCCCG AAAGCCCGAT GCCGGCCAAG 
CGATGCCGGC CAAGCCCCGA TGCTGGCCGC 
AGCCGCAGGC GGCTAGCCGG GGTTTGGTTC 



14 001 GCCTTTAGAC AGCCCACCCA 

14 051 ATTTCGGACC GGGAGCGCCG 

14101 GCCCGACCGC AGCTGACGTG 

14151 GGACCGAGCG CAGGACCCGA 

14201 CCTGCCCGGA GCACCTGACC 

14251 GGCGCAGCGC GGTG AG C AC C 

14 3 01 GGCCTCCACG GGCCGGGCGG 

143 51 GGACTTCTTC CACATCAGCC 

144 01 AGCGGCTGCT CCTCGAACTG 
14451 CGGCCGCCCA CCCTGGCGCG 
14 5 01 CTGGGACGAC TACACCGACG 
14 551 CCCGCCACAC CATGACCGGC 
14601 TCGTACGCGT ACCACCTGGC 
14 651 GTCCTCCTCG CTCGTCGCCG 
14701 GCGACTCCGA CATCGCCTTC 
14751 CGCACCACCG AGCTGGCCGC 
14 8 01 CCGCTGCCAC ACCTTCGACG 
14 8 51 GCGGCGGCCT CGTGGTGCTC 
14 9 01 GACACGGTGT ACTGCGTGAT 

14 951 GACCGACGGA ATCACCCTGC 

15 001 GCCTCGCCTG CCGACGGGCG 
15051 GAACTGCACG GCACCGGCAC 
15101 GCTCGGCGCC GCCCTCGGGC 
15151 TCGGCTCCGC CAAGACGAAC 
15201 GTCGGACTGC TCAAGACCGC 
15251 GAGCCTGAAC TTCACCACCC 
153 01 GCCTGACCGT C C AGCAGG AC 
15351 CTGATCGCCG GGGTGTCGTC 
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CGATGAGCCC GGTACTCGAA GCGATCTCCG 
TTGATGTTTT GTGGCAGCCA GTTGTTCAGC 
ATGGCCGCAT CCGCGTCAGC GTCCCCCTCG 
CCCGATCGCC GTCGTCGGGA TGGCCTGCCG 
CCGACGCGTT CTGGCGGCTG CTCAGCGAGG 
GCACCGCCCG AGCGGCGGCG AGCCGACTCC 
CTAC CTGGAC CGGATCGACG GCTTCGACGC 
CGCGCGAGGC CGTGGCGATG GACCCCCAGC 
AGCTGGGAGG CCCTCGAAGA CGCGGGCATC 
CAGCCGCACC GGCGTCTTCG TCGGCGCGTT 
TCCTGAACCT GCGGGCGCCG GGCGCCGTCA 
GTGCACCGCA GCATTCTGGC CAACCGCATC 
CGGTCCGAGC CTCACCGTCG ACACCGCACA 
TCCACCTGGC CTGCGAGAGC ATCCGCAGCG 
GCGGGCGGCG TCAACCTCAT CTGCTCGCCG 
GGCCCGCTTC GGCGGTCTCT CGGCCGCAGG 
CCCGCGCCGA CGGTTTCGTA CGCGGCGAGG 
AAGCCCCTCG CGGCGGCACG GCGCGACGGC 
CCGGGGGAGC GCCGTCAACA GCGACGGTAC 
CCAGCGGGCA GGCGCAGCAG GACGTGGTGC 
CGGATCACGC CGGACCAGGT GCAGTACGTC 
GCCCGTCGGG GACCCGATCG AGGCCGCCGC 
AGGACGCCGC CCGCGCCGTG CCGCTGGCCG 
GTCGGCCACC TCGAAGCCGC CGCCGGAATC 
CCTGAGCATC CACCACCGGC GGCTGGCGCC 
CCAATCCGGC CATCCCGCTC GCCGACCTCG 
CTGGCCGACT GGCCGCGCCC CGAACAGCCC 
CTTCGGCATG GGCGGCACGA ACGGTCACGT 



154 01 TGTCGTGGCG GCGGCGCCCG 

15451 TGCCTGAGCG GGTGGAAGTG 

15501 GTGGTGCCGA CGCCATGGCC 

15551 CGCGCAGGCC GGTCGCCTGC 

156 01 CCGACGCCGC GCGGGTCGGC 

15651 GCCCACCGCG CGGTCCTGCT 

15701 CCTGGACGCG CTGGCCGAGG 

15751 AGGCGTACAC CGAGGGCAGG 

15801 CAACGCCTCG GCATGGGGCG 

15851 CGACGCTCTC GACGAGGCGT 

15 901 CACTGCGCGA GATCGTCTTG 
15951 GGTGAGAATG TCATCGGCGA 

16 001 GACCGCCTAC ACCCAGCCCG 
16051 GGCTGGCAGC CTCCTTCGGC 
16101 GTCGGCGAGA TCGCCGCCGC 
16151 CGCGAGCGCT CTGGTGGCCA 
16201 CGCCCGGCGC GATGGCCGCG 
16251 CAGCTCGCCG GGCACGAGCG 
16301 CGACTC CGTG GTCGTCTCCG 
16351 CCGCCTGGCG GGGACGCGGC 
164 01 GCCTTCCACT CCCCGCACAT 
16451 CGCCGCCGGC CTGACCTTCC 
16 501 TCAC CGGTGA ACTGGTGACC 
16551 GCCGACCCCG AGTACTGGGC 
16 6 01 GTCCGGGGTG CGGGGGCTGT 
16651 TCGGCCCGGA CGCACCGCTG 
16701 CCCGCGGACC GGAGCCGTCC 
16751 CGGGCGCGAC GAGGTGGCCA 
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ATTCGGTGGC GGTACCTGAG CCGGTGGGGG 
CCTGAGCCGG TGGTGGTTTC TGAGC CGGTG 
CGTGAGCGCT CACAGCGCTT CCGCGCTGCG 
GGACGCACCT CGCCGCCCAC CGCCCCACCC 
CACGCGCTCG CCACCACCCG TGCGCCCCTC 
CGGCGGCGAC ACCGCCGAAC TGCTGGGCTC 
GCGCGGAGAC CGCGTCCATC GTGCGCGGCG 
ACGGCCTTCC TCTTCAGTGG GCAGGGAGCG 
GGAGTTGTAT GCCGTGTTCC CCGTCTTCGC 
TCGCCGCCCT GGACGTACAT CTGGACCGCC 
GGCGAGACCG ACTCGGGTGG GAACGTCTCG 
GGGTGCCGAC CATCAGGCAC TCCTCGACCA 
CGCTCTTCGC GATCGAGACG AGCCTGTACC 
CTGAAGCCGG ACTACGTCCT CGGCCACTCG 
GCACGTCGCC GGTGTCCTCT CGTTGCCGGA 
CGCGGGGACG GCTCATGCAG GCGGTTCGCG 
TGGCAGGCCA CGGCGGACGA GGCGGCCGAA 
GCACGTCACC GTGGCCGCCG TCAACGGCCC 
GCGACCGCGC CACCGTCGAC GAACTGACCG 
CGCAAGGCCC ACCACCTGAA GGTCAGCCAC 
GGACCCCATC CTCGACGAGC TGCGCGCGGT 
ACGAGCCGGT CATTCCCGTC GTCTCCAACG 
GCGACCGCGA CCGGGAGCGG CGCCGGGCAG 
GCGGCATGCG CGCGAGCCCG TGCGGTTCCT 
GCGAGCGCGG GGTGACCACG TTCGTCGAGC 
TCCGCGATGG CCCGCGACTG CTTCCCCGCC 
GCGCCCCGCC GCCATCGCCA CATGCCGCCG 
CGTTCCTGAG GTCGCTGGCC CAGGCGTACG 



16801 
16851 
16901 
16951 
17001 
17051 
17101 
17151 
17201 
17251 
17301 
17351 
17401 
17451 
17501 
17551 
17601 
17651 
17701 
17751 
17801 
17851 
17901 
17951 
18001 
18051 
18101 
18151 



TCCGCGGCGC 
CGCCGCTTCC 
TGCCGCTGCC 
CCTCGGAGTC 
GCGTGGGGCG 
GGAGCGGGTC 
GGGACGCCTC 
TTCGACTCGA 
GGGCCTGCCG 
CCGTCGCCGC 
GGCTCCGCGC 
CGAGGACCCG 
CAGGCACGCC 
ATCGGAGACT 
CCCCGACCCC 
TGTACGACGC 
GAGGCCCTGG 
GGAGGCCTTC 
CCACCGGCGT 
CACGAG C CGT 
GAGCGTGGCC 
CCCTGACGGT 
GCGGCGCAGG 
CGCCACCGTC 
GGGGCCTGGC 
GGCACGGGCT 
CGAGGCCCGG 
CGATCAACCA 



CGATGTCGAC 
CCCTCCCCAC 
GGGGTGGGGC 
CTCGGAGCAG 
GGCCTGAAGG 
CTCCTCGGCC 
GGGCACGGTA 
TGGCCGCCGC 
TTGCCCGCCA 
GCACCTGCGC 
CCGCCACGGG 
GTCGCCATCG 
CGAGGACCTG 
TCCCCACCGA 
GACCGGTCGG 
CGCCGACTTC 
CCGTCGACCC 
GAACGGGCGG 
GTTCGTCGGC 
CCCAGGCCAC 
TCGGGCCGCC 
GGACACGGCC 
CGCTGCGGCG 
CTGGCCACGC 
CCCCGACGGC 
GGGCCGAGGG 
CGCAAGGGGC 
GGACGGCGCG 
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TTCACCCGGG 
GTATCCCTTC 
AGCAGCCGGA 
GCAGGGCATG 
GCGGCTTGCC 
TGGTCACCAA 
CAAGCCGCCC 
CGAGCTGAGC 
CCCTCACCTT 
GCGGAGCTCA 
CGCCCTCGGC 
TGGCCATGAG 
TGGCGGCTGG 
CCGCGGCTGG 
GCACCAGCTG 
GACGCCGAGT 
GCAGCAGCGG 
GCCTGGACCC 
ATGACGGGGC 
CGACGGCTAT 
TGTCGTTCAG 
TGCTCGTCGT 
CGGCGAGTGC 
CGGGCATGTT 
CGCTGCAAGC 
CGTGGGCCTG 
ACGCCGTCCT 
AGCAACGGCC 



CCTACGGCGC 
CAGCGCGAGC 
GACCCCGGAA 
AGCGGGAGGA 
GGGCTCTCCG 
GCACGTGGCC 
GCACCTTCAA 
GAACGGCTCG 
CGACTACCCG 
CCGGTACGCC 
GCGGGTGACC 
CTGCCGCTAT 
TCGCGGACGG 
GACCTGGCGC 
CACGCGGCAG 
TCTTCGACAT 
CTGCTCCTCG 
GCGGGCGCTC 
AGGACTACGG 
CTGCTGACCG 
CTTCGGCCTT 
CGCTGGTCAC 
GACCTGGCCC 
CACCGAGTTC 
CGTTCGCGGC 
GTCCTCCTCG 
CGCGGTGATC 
TGACCGCGCC 



CACCGCCACG 
GCCATTGGCC 
CTTCCGGAAT 
GGGGGCGCGC 
TGAACGACCA 
GTCGTGCTCG 
GCAGTTGGGC 
GCACGGAGAC 
ACCCCTCTGG 
CGCCCCGGCC 
TCGGCACGGA 
CCCGGCGGCG 
CGCCGACGCG 
GGCTGTTCCA 
GGCGGATTCC 
CAGCCCGCGC 
AGTGCGCCTG 
AAGGGCAGCC 
CCCCCGTCTG 
GCAGCACGCC 
GAGGGGCCCG 
GCTCCATCTC 
TCGCCGGCGG 
TCGCGGCAGC 
GGGCGCCGAC 
AAAGGCTCTC 
CGGGGTTCGG 
CAACGGCCCC 
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18201 TCGCAGCAAC GCGTCATCCG TGCCGCGCTC GCGGCCGCCC GGCTCACCGC 

18251 GGACGAGGTC GACGTAGTGG AGGCGCACGG CACCGGCACC ACGCTCGGCG 
18301 AC C CGATCG A GGCGCAGGCC CTGCTCGCCA CGTACGGCCA AGGGCGTTCG 

183 51 GCGGAGCGGC CGTTGTGGCT CGGGTCGGTG AAGTCGAACA TCGGTCACAC 

184 01 GCAGGCCGCC GCGGGTGTCG CGGGCGTCAT CAAGATGGTG ATGGCGATGC 
184 51 GCCACGACCT GCTCCCCGCC ACCCTGCACG TCGACGAGCC GAGTGGCCAC 
18501 GTGGACTGGT CCACCGGCGC GGTGCGACTG CTCACCGAGC CGGTCGTCTG 
18 551 GCCGCGCGGC GAACGTCCGC GCCGCGCCGC GGTGTCGTCC TTCGGCATCT 
186 01 CCGGCACGAA CGCGCACCTG GTGCTCGAAG AGGCGGGGCA GGACGAGTAC 
18 651 GTTGCGGGAG CCGCCGACGA CGCCGGGCCG GTGGACGGTG CTGTGCTGCC 
18 701 GTGGGTGGTT TCCGGACGGA CCGGAGCGGC GCTGCGCGAA CAGGCCCGCC 
18 751 GTTTGCGTGA GTTGGTGACC GGCGGCTCGG CCGATGTCTC TGTGTCCGGG 
18 801 GTGGGCCGGT CGCTGGTCAC CACGCGGGCG GTGTTCGAGC ACCGGGCCGT 
18 851 GGTCGTGGGC CGCGACCGGG AC ACG CTG AT CGGCGGCCTC GAGGCCCTTG 
18 9 01 CGGCGGGTGA CGCGTCGCCG GACGTCGTGT GCGGGGTCGC GGGCGATGTC 

18 951 GGCCCCGGCC CGGTGCTGGT GTTCCCCGGG CAGGGCTCGC AGTGGGTGGG 

19 001 CATGGGAGCC CAACTCCTTG GCGAGTCCGC GGTGTTCGCG GCGCGGATCG 
19051 ACGCGTGCGA GCAGGCGCTG TCCCCGTACG TCGACTGGTC ACTGACAGAG 
19101 GTCCTGCGCG GGGACGGGCG CGAACTGTCG CGCGTCGACG TCGTCCAGCC 
19151 CGTGCTGTGG GCGGTGATGG TCTCGCTCGC CGCCGTCTGG GCGGACCACG 
19201 GCGTCACCCC GGCCGCCGTC GTCGGGCACT CC CAGGG AG A GATCGCCGCT 
19251 GTGGTCGTCG CCGGCGCGCT CACCCTGGAG GACGGCGCCA AGATCGTGGC 
19301 CCTGCGCAGC CGGGCGCTGC GTCAGCTCTC GGGCGGGGGC GCCATGGCCT 
19351 CCCTCGGGGT GGGCCAGGAA CAGGCAGCCG AACTCGTCGA GGGCCACCCC 
19401 GGAGTGGGCA TCGCCGCCGT CAACGGCCCG TCATCGACCG TCATTTCAGG 
19451 CCCGCCCGAG CAAGTCGCCG CCGTCGTCGC CGACGCCGAG GCGCGCGAGC 
195 01 TGAGAGGCCG CGTCATTGAC GTGGACTACG CCTCGCACAG CCCCCAGGTC 
19551 GACGCCATCA CCGACGAACT CACCCACACC CTGTCCGGCG TCCGCCCCAC 
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19601 CACGGCCCCG GTGGCGTTCT ACTCGGCCGT GACCGGAACC CGCATCGACA 

196 51 CGGCGGGCCT CGACACCGAC TACTGGGTCA CCAACCTGCG CCGCCCGGTC 

197 01 CGGTTCGCCG ACGCCGTCAC CGCGCTCCTC GCCGACGGCC ACCGGGTCTT 
19751 CATCGAGGCC AGCAGCCACC CCGTCCTCAC CCTCGGCCTC CAGGAGACCT 
19801 TCGAGGAGGC CGGGGTCGAC GCCGTCACCG TCCCCACCCT GCGGCGCGAG 

198 51 GACGGCGGCC GGGCACGCCT GGCCCGCTCG CTGGCACAGG CCTTCGGCGC 
19 901 CGGGTGCGCG GTGAGGTGGG AGAACTGGTT TCCGGCCACC GGTACGTCCA 
19951 CCGTGGAGCT GCCGACGTAC GCCTTCCAGC GTCGCCGTTA CTGGCTGGAG 
20001 GCCCCCACGG GCACCCAGGA CGCGGCGGGC CTGGGCCTCG CCGCTGCGGG 
2 0051 GCACCCGCTC CTCGGGGCGG C CAC CGAGAT CGCGGACGGC GACATCCGCC 
20101 TGCTCACCGG CCGTATCAGC AGGCACAGCC ACCCCTGGCT CGCTCAGCAC 
2 0151 ACCCTCTTCG GTGCCGCGGT CGTGCCCGCC TCCGTCCTCG CGGAATGGGC 
2 0201 GCTGCGCGCC GCCGACGAGG CCGGCTGCCC GCGTGTCGAC GACCTCACGC 
20251 TGCGCACCCC GCTGGTGCTG CCCGAGACCG CGGGCGTGCA GGTGC AG AT C 
2 0301 GTGGTCGGCC CGGCCGACGC GCGGGACGGG CACCGCGACT TCCACGTCTA 
20351 CGCCCGCCCC GACGG CAAGG ACGCCTCTGA GGGCGAGGGC ATCGCCGAGG 
20401 GCGAGGGTGC CTCTGAGGGC GAGGGTGCCT CCGGCGGCAC CGATGCGCCG 
20451 TGGACCTGCC ATGCCGACGG CCGACTGGTC GCCGAGCCCA CCGGCACGGC 
20501 CTCGGAGGAC TCCCCGGACA CGGTGTGGCC GCCGCCCGGC GCCGAACCCG 
20551 TCGACCTGGG CGACTTCTAC GAGCGGGCCG CCGCCACCGG AGTCGGCTAT 
20601 GGACCGGTCT TCACGGGGCT GCGCGCCCTG TGGCGGCGGG ACGGCGAGCT 
2 0651 GTTCGCCGAG GCGGTGCTGC CGCAAGAAGC CCCGGAAACC GCCGGGTTCG 
20701 GCATGCACCC GGCGCTCCTC GACGCCGCAC TGCACCCCGC ACTCCTCGGC 
20751 GAGCGGCCGG CCGAGGAGGA CAAGGTGTGG CTGCCGTTCA CGCTGACCGG 
2 0801 AGTGACCCTG TGGGCCACCG GTGCCACCTC TGTACGCGTC CGTCTCACCC 
2 0851 CGCTGGACGA CGACCCCGAC GCGTCGGCGG ACGGGCGGGC CTGGCGGGTC 
20901 GGCGTGAGCG ACCCGACCGG CGCGGAGGTG CTGACCTGCG AGGCCCTGGT 
2 0951 CGCGGTGGCG GCGGGCCGCC GCGAGCTGCG GGCCGCGGGG GAGCGGGTGT 
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21001 CCGATCTGTA CGCGGTGGAG TGGGTGCCGG TGCCGGGCCC GGGGCCGGTG 

21051 GGTGAGGGTG CTGACTTCTC GGGCTGGGCC GGTCTGGGGG AGTGCGGGGA 

21101 GCGTTGGGAG TGCGTGGGGC GCGTGGAGCG CTGGTACGAG GACCTGGACG 

21151 CTCTCGGCGC GGCTGTCGAG GGTGGGGCTT CGGTGCCCTC TGTCGTTCTC 

212 01 GCCACCGCGG CTGCCGCCCC TGGTGG AG CG GGCGACGGAG CCGCCGATGC 

212 51 GCTGAGCGCG GTGCGGTGGA CCGGCGCGCT CCTCGATCAG TGGCTCGCCG 

213 01 ACGCGCGGTT CGCCGACGCC CGGCTGGTGG TGATCACGTC CGGCGCGGTC 

213 51 GCCACGGGTG ACGATTTCCT TCCCGACCCG GCCGCCGCGG CGGTACGAGG 

214 01 ACTGGTCGAG CAGGCGCAGG TCAGGCACCC CGGCCGCATC CTCCTCGTCG 

214 51 ACACGGAAGC CGGGGCCGGG CTCGGGGTCG GCGCCGGAGT GGATGACGCG 

215 01 CTCCTGGAAC AGGCCGTGGC CATGGCTCTC GGCGCCGACG AACCGCAACT 
21551 CGCCCTGCGC GCGGGGCGGG TCCTGGCGCC CCGCCTCACC GCACCCCAGG 

216 01 ATGCGGCCGT CACCGAAGCG GCGCGACCGC TCGACCCGGA CGGCACCGTA 
21651 CTCATCACAG GGCCGGCCGG TGCTCCGGTG GCCGACCTCG CCGAACACCT 

217 01 CGTACGCACC GGGCAGTGCA GGCATCTGCT GCTCCTGCCT GGAGACGGTG 
21751 AACTGGAGGA AATGGCCGAG GAGTTGCGGG GCCTCGGCGC CAC CGTGG AC 

218 01 CTGAGTACCG CCGACCCGGC GGACCCGACC GCCCTCGCCG AAGTGGTCGC 
218 51 CGCCGTCGAG GGGGACCATC CTCTTACGGG GGTCATCCAC GCCACCGGAG 
21901 TCGTGGACGC GTTCGATCCC GGCGACTCGG CGAGCGACTT GATGATCGAC 
21951 TCGGCGAGCG ATTCGTTCGC CGAGG CATGG TCGTCGAGGG CGGGCGTCAC 
22001 CGCCGCACTG CACACCGCGA CCGCCCACCT TCCCCTGGAC CTGTTCGCCG 
22 051 TCCTGTCCCC GGCGGGCGCG GACCTGGGCA TTGCCCGGTC GGCGGCCGCC 
22101 GCGGGCGCCG ACGCCTTCAG CGCGGCACTC GCCCTGCGCC GGCACACGAC 
22151 CGTCACGACG GACACGACAG CCCCGCCGCG CACGACAGCC CCGCCGCGAA 
222 01 CGACAGCCTC GCCGCGCACG ACAGCCCTGT CGTCGTCGCG CACGACGGGC 

222 51 GTGGCCCTCG CCTACGGGCC GCCCACCGCG CCGAGGCCCG GCATCAAGGG 

223 01 GACGGCGCCC GGTCGGATCC CCGTGCTGCT CGACGCCGCT CGCGCTCACG 
223 51 GGGGCGGTTC GCCCCTGCTC GGGGCCCGCT TGGCCGCGCG TGCCCTGGCC 



22401 

22451 

22501 

22551 

22601 

22651 

22701 

22751 

22801 

22851 

22901 

22951 

23001 

23051 

23101 

23151 

23201 

23251 

23301 

23351 

23401 

23451 

23501 

23551 

23601 

23651 

23701 

23751 



GCCGAGTCCG 
GCTGGCAGTG 
CCGACCGCAA 
GCCCCCGAAC 
CGCGGTCCTC 
TCAAGCAGCT 
CTCGTGGAGG 
CCCGACCCCC 
CGAGCGAGAC 
GGGCAGGCGT 
CGCCGCAGAC 
CCGTGCTCGC 
CGGCTCGAAG 
CGACAGCGGC 
AGGCCGCGTC 
GTCGGGCACG 
GCACATGGCG 
CCGAGCTCCA 
CAGGAACCGG 
GGCCTCACCG 
TCGAGGACTT 
CCGGACCCGG 
GGACGACGCG 
AAGCCCTGGC 
GAGCTGTTCG 
TACGGGCGTC 
CCCGCATCCC 
AGCGTCATCT 



CCGCCGAGGG 
GCCGCAGCCG 
GCCCCCCGCG 
AACTCCGTCT 
GGCCGCACCG 
CGGC CTTG AC 
ACACCGGTCT 
GCGGCGATCG 
GACCGCCACA 
CGTCCGCGCT 
ACCGTGCTGA 
CGCCCAGCTG 
CGCTCCTCAC 
GACGGCAACG 
CGCCGACCAG 
GCACCTCGCG 
AGTGAAGAGC 
TGACACG CGT 
TGGCCCTGGT 
GAGGACCTCT 
TCCCACCGAC 
CCGCGTACGG 
GGCTCCTTCG 
GATGGACCCG 
AGCGCGCCGG 
TACGCCGGGG 
CGAGGGGTTC 
CGGGCCGGGT 
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CGTCGCCGGC 
CGGCCGGAGC 
GACTGGCCGG 
GCTCATCGAC 
ACCCGGAAGC 
TCGCTGACCG 
GCGCCTGCCC 
CCGCGCACCT 
CAGAGGTCCG 
CGCCCCCGGC 
GCGACCTGAC 
CCCCACACCG 
GCGCTGGAAG 
GCGGCGATGA 
ATCTTCGACT 
CGTGACCCCC 
AACTGGTCGA 
CGGCGCCTGG 
CGGCATGGCC 
GGGACCTGGT 
CGGGGCTGGG 
GACCAGCTAT 
ACGCCGACTT 
CAGCAGCGGC 
CATCGAACCC 
TGTCCAGCGA 
GAGGGGCACG 
CGCGTACAAC 



CTGCCCGCGC 
ACCGACCCGG 
CCCGACTGGC 
GCCGTACGCA 
GCTGCGCGGG 
CCGTGGAGCT 
ACCGCCCTCG 
CCGCGAGCGG 
GAGGGCAGAC 
GGATCGGCCG 
CCGCATGGAG 
AGACGGGTGA 
ACCACGAACG 
CGACGCCGCC 
TCATCGACAA 
ACTCCGAAGG 
ATATCTGCGC 
TGCAGGAGGA 
TGCCGCTTCC 
CGCCGCGGGC 
ACCTGGAGGC 
GTCCGCCACG 
CTTCGGCATC 
TGATGCTGGA 
GTCTCCCTCA 
GGACTACATG 
CCACCACCGG 
TACGGCCTCG 



CGCTGCGCGC 
CGCACCGCCG 
CCCCCTGTCC 
CCCACGCCGC 
GACGCCACCT 
GCGCAACCGG 
TCTTTCGCTA 
CTGACCAGCC 
GCCCGCAGCG 
CCGGACCGCC 
AACACCCTCT 
GATCACCACC 
CCACGGCGAA 
GAACGCCTCA 
CGAGCTTGGT 
C CGGGTGAC C 
AGGGTGACCA 
GGACCGCAGG 
CGGGCGGCGT 
AAGGACGCCA 
GCTCTACGAC 
GCGGGTTCGT 
AGCCCGCGAG 
GACGTCCTGG 
AGGGCAGCCG 
TCCCAACTGC 
CAGCCTCACC 
AAGGCCCGGC 
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238 01 CGTCACCGTC GACACAGCCT GTTCCGCCTC GCTCGTCGCC ATCCACCTGG 

23 8 51 CGAGCCAGGC GCTGCGCCAG CGTGAGTGCG ACCTCGCCCT CGCGGGCGGT 

23 901 GTGCTCGTAC TGTCCAGCCC GCTCATGTTC ACCGAGTTCT GCCGCCAGCG 

23 951 GGGCCTTGCT CCCGACGGCC GCTGCAAGCC GTTCGCCGCC GCGGCGGACG 

24 001 GCACCGGCTT CTCGGAGGGC ATCGGTCTGC TCCTCCTGGA GCGCCTGTCC 
24 051 GACGCGCGCC GCAACGGCCA CAAGGTGCTC GCGGTGATCC GCGGCTCCGC 
24101 CGTCAACCAG GACGGCGCGA GCAACGGCCT GACCGCCCCC AACGACGCCG 
24151 CGCAGGAACA GGTCATCCGC GCCGCCCTCG ACAACGCCCG CCTCACCCCG 
24201 TCCGAGGTGG ACGCCGTCGA GGCGCACGGC ACCGGCACCA AACTGGGCGA 
24251 CCCCATCGAG GCCGGAGCGC TGCTCGCCAC CTACGGGCAA CACCGCGCCC 

243 01 GGCCCCTCCT CCTCGGCTCC CTCAAGTCCA ACATCGGCCA CACCCACGCC 
24351 ACCGCGGGCG TCGCCGGTGT C ATCAAG AC C GTCATGGCGA TCCGCAACGG 

244 01 TCTGCTCCCC GCCACCCTCC ACGTCGAGGA ACTGAGCCCG CACGTCGACT 
2 44 51 GGGACGCGGG CGCGGTCGAG GTCGTCACGG AGCCCACCCC GTGGCCCGAG 
2 4 501 ACCGGCCACC CCCGGCGCGC GGGCGTCTCC GCGTTCGGGA TCTCCGGGAC 
24 551 GAATGCGCAC TTGATCCTGG AGGAGGCGCC GCCGGAGGAG GATGTGCCCG 
24601 CCCCCGTGGT TGTGGAGTCG GGCGGGGTCG TTC CGTGGGT GGTGTCCGGG 
24 651 CGGACGCCGG AGGCGCTGCG TGAACAGGCC CGGCGACTCG GCGAGTTCGT 
24701 GGCAGGCGAC ACGGACGCAC TGCCGAACGA GGTCGGCTGG TCCTTGGCCA 
24751 CGACCCGGTC GGTGTTCGAG CACCGGGCTG TGGTCGTGGG GCGTGACCGG 
24 8 01 GATG CGTTG A CGGCTGGCCT GGGGGCGTTG GCTGCGGGTG AGGCTTCGGC 
24851 GGGTGTGGTG GCCGGGGTGG CCGGTGATGT GGGTCCTGGG CCGGTGTTGG 
24 9 01 TGTTTCCGGG GCAGGGGGCG CAGTGGGTGG GCATGGGTGC CCAGCTGTTG 
24 951 GACGAGTCTG CGGTGTTCGC GGCGCGGATC GCGGAGTGTG AGCGGGCCCT 
2 5001 GTCGGCGCAT GTGGACTGGT CGCTGAGTGC GGTGTTGCGC GGGGACGGGA 
25051 GTGAGCTGTC CCGGGTGGAA GTGGTGCAGC CGGTGCTGTG GGCGGTGATG 
25101 GTCTCGCTGG CTGCGGTGTG GGCGGATTAC GGGGTCACTC CGGCTGCCGT 
25151 GATCGGGCAC TCGCAGGGTG AGATGGCTGC CGCGTGTGTG GCGGGGGCGC 
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2 5201 TGTCGCTGGA GGATGCGGCG CGGATCGTAG CGGTACGCAG TGACGCGCTT 

25251 CGTCAGCTGC AAGGGCACGG CGACATGGCC TCGCTCAGCA CCGGTGCCGA 

253 01 GCAGGCCGCT GAGCTGATCG GTGACCGGCC GGGCGTGGTC GTCGCGGCGG 

253 51 TCAATGGGCC GTCGTCTACG GTGATTTCAG GGCCGCCGGA GCATGTGGCA 

25401 GCCGTGGTCG CGGATGCGGA GGCACGTGGT CTGCGCGCCC GTGTCATCGA 

25451 CGTCGGCTAT GCCTCGCATG GCCCCCAGAT CGACCAGCTC CACGATCTGC 

255 01 TGACCGAACG CCTGGCCGAC ATCCGGCCCA CGAACACGGA CGTGGCCTTC 
25551 TATTCGACGG TCACCGCCGA GCGCCTGACG GAC AC CACGG CCCTGGACAC 

256 01 GGATTACTGG GTCACCAACC TCCGTCAGCC CGTCCGGTTC GCCGACACCA 
25651 TCGAAGCCCT TCTCGCGGAC GGCTACCGCC TGTTCATCGA GGCCAGCGCC 
25701 CACCCCGTGC TGGGCCTGGG CATGGAGGAG ACCATCGAGC AGGCGGACAT 
25751 GCCCGCCACC GTCGTCCCCA CCCTCCGCCG CGACCACGGC GACACCACCC 
2 58 01 AGCTCACCCG CGCCGCCGCC CACGCCTTCA CCGCCGGCGC CGATGTCGAC 
2 5851 TGGCGGCGCT GGTTCCCGGC CGACCCCGCC CCCCGCACGA TCGATCTCCC 
25901 CACCTACGCC TTCCAGCGCC GCCGCTACTG GCTGGCCGAC ACAGTGAAGC 
25951 GGGACAGCGG ATGGGACCCG GCCGGGTCGG GGCATGCCCA GTTGCCGACC 
26 001 GCGGTCGCCC TCGCCGACGG GGGAGTGGTG CTGAACGGCC GGGTGTCCGC 
26051 CGAGCGCGGT GGCTGGCTGG GCGGGCATGT GGTGGCGGGG ACGGTTCTGG 
26101 TGCCGGGTGC GGCGTTGGTG GAGTGGGTGT TGCGGGCCGG TGATGAGGCG 
26151 GGTTGCCCCT CGCTTGAGGA GTTGACGCTC CAGGCGCCGT TGGTGTTGCC 
26201 CGAGTCGGGT GGGTTGCAGG TTCAGGTGGT CGTGGGTGCG GCTGATGAGC 
26251 AGGGCGGCCG TCGTGACGTA CATGTGTATT CGAGGTCTGA GCAGGACGCG 
263 01 TCGGCGGTGT GGCAGTGCCA TGCCGTCGGT GAGCTCGGGC GCGCGTCGGT 

263 51 GGCGCGGCCG GTGCGGCAGG CCGGGCAGTG GCCTCCGGCG GGGGCCGAGC 
26401 CGGTGGAGGT GGGCGGCTTC TACGAGGGGG TCGCGGCCGC CGGTTACGAG 

264 51 TACGGTC CGG CGTTCCGTGG GCTGCGCGCG ATGTGGCGGC ACGGTGATGA 

265 01 CCTCCTTGCG GAGGTCGAGC TGCCGGAGGA GGCCGGTTCG CCGGCCGGTT 
26551 TCGGCATCCA CCCGGCGCTG CTGGACGCCG CCCTGCACCC GCTGCTCGCA 
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266 01 CAGCGGAGCC GGGACGGGGC CGGGGCGGGG GCCCACGGCG GGCAGGTGCT 
266 51 GCTGCCTTTC AGCTGGAGCG GTGTTTCCCT GTGGGCCAGC GAGGCCACCA 

2 6701 CTGTGCGGGT GCGGCTCACC GGG CTGGGAG GAGGGGACGA CGAGACGGTG 

2 6751 TCCCTGACGG TAACCGACCC CGCCGGTGGC CCCGTGGTGG ACGTGGCAGA 

268 01 GCTGCGGTTG CGGTCGACGA GCGCCCGGCA GGTGCGGGGT TCGGCAGGCC 

268 51 CCGGCGCGGA CGGG CTCTAC GAGCTGCGGT GGACACCGTT GCCCGAGCCG 

26 901 CTTCCCGTAC CGGCCCCCGC GAACGGTCGC GATGTGGCCG CCGACCTGTC 
2 6 951 CGGATGCGCG GTGCTCGGCG AACTGGTCGC GGAACCGGGC CCGGGCATCG 

27 001 AC CTGG AGGG CTGCCCCTGC TACCCGGGCG TCGGCGCGCT CGCCGACAAC 
27 051 GCCTCCCCGC CCTCCATGAT CCTCGCCCCC GTGCACAGCG ACACCACAGG 
27101 CGGCGACGGA CTCGCCCTGA CGGAACGGGT GTTGCGCGTC ATCCAGGACT 
27151 TCCTGGCTGC ACCGAGTCTG GAACAGAAAC AGACGCGCCT GGCCTTCGTG 
27201 ACCCGGGGCG CGGCGGACAC AGGTAG CACG ACGGGAGGCT CGGCTGCCCC 
27251 GGCAGAGGCA GTCGACCCGG CGGTCGCGGC CGTATGGGGC CTAGTACGCA 
273 01 GCGCGCAGTC GGAGAACCCC GGCCGCTTCG TACTGCTGGA CACCGACGCG 
27351 CCCCTCGACC AGGCGTCCGT TGCCCCTCTC GTGGACGCGG TGCGGTCTGC 
27401 CGTGGAGGCG GACGAGCCCC AAGTCGCCCT GCGCGGGGGA CGGTTGCTCG 
27451 TGCCCAGGTG GGCGCGGGCC GGCGAGCCCG TCGAGCTGGC CGGGCCGGCC 
27501 GGAGCGCGGG CGTGGCGGCT GGTGGGCGGA GACTCCGGGA CGCTGGAGGC 
27551 CGTCGTGGCG GAGGCTTGCG ACGACATTGT GCTGCGCCCG TTGGCGCCGG 
276 01 GCCAGGTCCG CGTCGCCGTC CATACGGCCG GGGTCAATTT CCGTGACGTC 
27651 CTGATCGCCC TGGG CATGTA CCCGGACCCG GACGCGCTGC CCGGCACCGA 
27701 GGCGGCCGGC GTGGTGACGG AGGTCGGGCC GGGCGTCACC CGTCTGTCGG 
27751 TGGGCGACCG CGTGATGGGC ATGATGGACG GCGCCTTCGG CCCGTGGGCC 
27801 GTCGCCGACG CGCGCATGCT GGCCCCGGTC CCGCCCGGCT GGGGCACCCG 
278 51 GCAGGCGGCC GCCGCTCCCG CCGCGTTCCT GACGGCTTGG TACGGGCTGG 
27 901 TGGAGCTGGC CGGTCTGAAG GCGGGCGAGC GTGTGTTGAT CCATGCCGCC 
27951 ACGGGTGGTG TGGGGATGGC GGCGGTGCAG ATCGCCCGGC ATGTGGGTGC 



28001 

28051 

28101 

28151 

28201 

28251 

28301 

28351 

28401 

28451 

28501 

28551 

28601 

28651 

28701 

28751 

28801 

28851 

28901 

28951 

29001 

29051 

29101 

29151 

29201 

29251 

29301 

29351 



CGAGGTGTTC 

TGGGCATCGA 

GACGCCTTCC 

CAGCCTCACC 

GCGGGCGCTT 

GTCGCGCTGG 

CGACGCCGGG 

TCTTCGCCGG 

GGGCGGGCGC 

CAAGCTGGTG 

TCGTCACCGG 

GCGCGTACCG 

GGCCGCCCAC 

CCGAAGCCAC 

GCCCTGATCG 

TGCCGCCGGA 

GCCTCACCCG 

GAGGCCACGA 

CGCCTCCACC 

CCTATTGCGA 

CTGTCCGTGG 

GTTGTCGGCG 

CCAGCGCGGC 

CGCCCCGACC 

CGACGCTCCG 

CCGCCACCGC 

TGGTCCGGCA 

CACCGAGTTG 



GCCACCGCGA 
CGCCGCCCAC 
GGCAGGCCAC 
GGTGAACTGC 
CGTGGAGATG 
AGCACCCCGG 
CCCGAGCGGC 
CGGATCACTG 
GAGAGGCGCT 
CTCGACGTGC 
GGGTACCGGC 
GGGAGAGCAA 
GGCGCCGAGG 
CTTCGTCGCT 
AAGGGATCGA 
GTACTCGACA 
CGTATGGGCG 
GGGAGTCGAG 
ATGGGCACCC 
CGCGCTGGCC 
CGTGGGGGTT 
GCCGACCGGG 
ACGCGGCTGC 
TGCTCGCCAT 
GTCCCCGCCG 
CCGTCCCACC 
GGCTCGCCGG 
GTGTGCACCC 
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GTCCGGGCAA 
CGCGCCTCGT 
CGACGGCCGT 
TCGACGCGTC 
GGCAAGAGCG 
GGTGTCGTAC 
TCGGGCTGAT 
GTACCGCTGC 
CCGCCACATG 
CCGCGCCGCT 
ACCATCGGCG 
GCACCTGCTC 
AACTTGTCTC 
GCCGACGTGA 
TCCGGCCCAT 
ACGCTCTGAT 
GCGAAGGCCG 
GCTGGGACTG 
CAGGGCAGGC 
GCTCTCCGAC 
GTGGGAGGCC 
CCCGCATCGA 
GCCCTGCTGG 
GGACCTGGAC 
TGCTGCGCAC 
GCGGCGGCGG 
CCTCACCGAG 
ACGCGGCAGG 



GCACGCCGTG 
CGCGCGACCT 
GGCGTGGACG 
CCTGCGATTG 
ATCCGCGCGA 
GAGGCCTTCG 
GCTCGACAGG 
CGGTCACCGC 
AGTCAGGCGA 
CGACCCCGAC 
CCGCCGTGGC 
ATCGTCAGCC 
TCGTATAGCC 
GTGAGCCCGA 
CCGCTGACCG 
CGGCTCCCAG 
CCGCCGCGCA 
TTCGTGATGT 
CAACTACTCC 
GCGCGGAGGG 
ACCAGCGGCC 
CCGGTACGGC 
CAGCGGCACG 
GCCCGCGTAC 
TCTGGCGGCC 
CCGCTGACGG 
GAGGCACGGC 
GGTGCTCGGG 



CTGGAGGAGA 
CGCCTTCGAG 
TCGTCCTCAA 
CTCGGCGACG 
CCCCGAGCTG 
ACCTCGTCGC 
CTCGGCGAGC 
ATGGCCGCTG 
GGCACACCGG 
GGCACCGTCC 
CGAACACCTG 
GCAGCGGGCC 
GAGTTCGGGG 
CGCGGTCGCC 
GTGTCGTGCA 
ACCACCGAAA 
GCAACTCCAC 
TCTCCTCCTT 
GCCGCCAACG 
GCTCGCCGGC 
TGACCGGGAC 
ATCAGGCCGA 
CGCCCACGGG 
CCGCCGCGTC 
GCCGGAGCGC 
GGCGACGGAC 
TCGAACTCCT 
CACGCCGACG 



2 9401 CGGGCGCGGT CCAGGTGGAC 

2 9451 CTGACCGCCG TCGAACTGCG 

29501 ACTGCCCGCC GCCCTCGTCT 

2 9551 CCCACCTGGC CGAACGGCTC 

2 9601 GTGAGCGGTG CGGAGGGCGT 

2 9651 CGGCGACATG ACCGCCCAGG 

29701 CCCTGTCCGC CGCCGTCCCG 

2 9751 CGCCTGGAGG CGCTGCTCGC 

2 9801 GGCCGCGGGA GCCGCGGTGG 

2 9851 CCGTGGATCA GCTGGAGACG 

2 9901 GACAACGAAC TCGGGGTGTG 

2 9 951 ACGGGCGGGG AGCTGCAGCG 

3 0001 TACCTCAAGC GTGTCTCCGC 
3 0051 CGAGGCGGAG GAGCGCGGCC 
3 0101 GCCGCTACCC CGGCGGCATC 
3 0151 GCCGCGGGCG GCAACGCCCT 
3 02 01 CCTGCGACGC CTCTTCCACC 
3 0251 CCCGCGAGGG CGGCTTCCTC 
3 0301 TTCGG CATCA GCCCCCGCGA 
3 0351 GCTCCTGGAG TGCGC CTGGG 
3 0401 GGTCCCTCCA GGGCAGCCGT 
3 0451 GGCTTCGGCA CCCCGCACAT 
3 0501 CGGCAGCGCC CCGAGCGTCC 
30551 TCGAAGGGCC CGCGGTGACG 
3 06 01 GCCGTGCACC TGGCGGCCCA 
306 51 GCTCGCGGGC GGTGTCACCG 
3 0701 TCTCGCGCCA GCGCGGCCTG 
3 0751 GCCGCCGCGG ACGGCACGGC 
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GCGCCGTTCA AGGAACTCGG CTTCGACTCG 
CAACCGGATC GCCGCCGCGA CCGGCCTGAA 
TCGACTACCC GCAGGCTCGC GTTCTCGCCG 
GTCCCGGAGG GCGCGGGGGC CATGGGCGGT 
GAGGGACGCG TACGGGGCAG GCGGTCCGGG 
TCTTGCTGGA GGTGGCCCGC GTCGAGCACA 
CACGGCCTGG ACCGGGCGGC CGTCGCGGCC 
CCGCTGCACG GCGACGACGG CGGCCACGGG 
AGGGTGACGG CGACAGCGAC GGCGACGGCG 
GCCACCGCCG AGCAAGTACT GGACTTCATC 
AGCCGCGTGC CGGCCGCACA CCAGGCGATC 
CACATGGTGA GCGAAGAGAA ACTGGTCGAC 
GGACCTGCAC GCCACCCGGC AGCGGCTGCG 
AGGAACCCGT GGCCGTGGTG GAGGCCGCCT 
CGCACCCCCG AAGACCTGTG GGACCTGGTC 
GGGCGCCTTC CCCGACAACC GCGGCTGGGA 
CCGACCCCGA CCACCCCGGG ACGACCTACG 
CACGACGCCG ACCTGTTCGA CCCGGAGTTC 
GGCCGCGGTC CTCGACCCGC AGCAGCGACT 
AGGCACTGGA GCGCGCGGGC ATCGACCCGC 
ACCGGCGTGT ACGCGGGTGC CGCCCTGCCC 
CGACCCCGCC GCCGAGGGCC AC CTGGTCAC 
TCTCGGGCCG GCTCGCCTAC ACCTTCGGCC 
ATCGACACCG CCTGCTCGTC GTCGCTCGTC 
CGCGCTGCGG CAGCGCGAGT GCGATCTGGC 
TCATGACCAC CCCGTACGTG TTCACCGAGT 
GCCGCCGACG GCCGGTGCAA GCCCTTCGCG 
CTTCTC CGAG GGCGCCGGAC TCCTCGTACT 



30801 
30851 
30901 
30951 
31001 
31051 
31101 
31151 
31201 
31251 
31301 
31351 
31401 
31451 
31501 
31551 
31601 
31651 
31701 
31751 
31801 
31851 
31901 
31951 
32001 
32051 
32101 
32151 



GGAACGCCTC 
TCCGCGGCTC 
CCCAACGGCC 
GCGGCTCTCG 
CCCGGCTGGG 
CAGGAGCGCC 
CATCGGCCAC 
TCCAGGCACT 
CCCACCCCGC 
GCCGGTCGCC 
GCATCTCCTC 
GAGGCGCCCG 
CGGGGGAGGG 
AAGAGCAGGG 
CAGCGGCAGC 
GTCCGCCCGC 
ACCATGTCGC 
CTGCGCCGCA 
TGATGAGCGT 
CCGCGCTGAC 
CTGTTC AC CG 
CGACACCTTC 
TCGACCCCCT 
GACACCGCGC 
GCTGTTCGCC 
TCGCCCCCAG 
CACGTCGCCG 
CCGGGGCCGC 



TCCGACGCCC 
GGC CGTCAAC 
CCGCCCAGCA 
CCCGCGGAGG 
CGACCCCATC 
ACGGGGGCCG 
ACGCAGGGCG 
GCGGCACGAG 
ACGCCGACTG 
TGGCCGCGCG 
CTTCGGCATC 
CGGCCGACGC 
GTGCGGCCGG 
CCAAGGACAG 
GGTCGTCGAT 
AGCCCCGCCG 
CCACGCGGAC 
CCCTGTTCGA 
GCCGCAGCGC 
CCGGGCCGCA 
GCCAGGGAAG 
GACGTCTTCG 
GCTCGAACAG 
AGGCCGCCGT 
CTCGAAGTCG 
CCACCTGACC 
GGGTGTTCTC 
CTCATGCAGG 
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GCCGGGCCGG 
CAGGATGGCG 
GCGCGTGATC 
TGGACGCGGT 
GAGGCCGACG 
GCCGCTGTGG 
CGGCCGGTGC 
ACGCTGCCCG 
GGAGTCGGGC 
GGGAGCACGG 
TCCGGCACGA 
CGAAGGAGCG 
TGGTGCGGGT 
GGCCAAGAGC 
GATGCCGACG 
CGCTCCGCGC 
CACTCCATCG 
GCACCGGGCG 
TTGCCGCCCT 
GGGCCGGCGA 
CCAACGCCCA 
CCGAGTCGCT 
CCGCTGAAGC 
GCTGCACGGG 
CCCTGTACCG 
GGGCACTCCG 
CCTGGCGGAC 
CCCTGCCCGC 



CCACCGGGTG 
CGAGCAACGG 
CGCGCCGCCC 
CGAGGCGCAC 
CGCTCCTCGC 
CTCGGCTCGG 
CGCGGGCCTG 
CCACGTTGTA 
GCGGTGCGCC 
GGAGCACACC 
ACGCCCACCT 
GGTGGCGACG 
CGGCGCCACG 
AGCAC CAACA 
CCGCACCTCC 
CCAGGCCGAC 
CCGACATCGG 
GTCGTCCTCG 
CGCGGCAGGA 
GGAACGGCGG 
GGCATGGGCA 
CGACGAGACC 
CCGTCCTGTT 
ACCGGCATGA 
CCAGGTCACC 
TCGGCGAGAT 
GCCTGCACGC 
AGGTGGCGCC 



CTCGCCGTCA 
CCTCACCGCC 
TCGCCGGGGC 
GGCACCGGCA 
CACCTACGGT 
TGAAATCCAA 
ATCAAGATGG 
CGCCGACGAG 
TGCTCAGCGC 
CGCAGGGCCG 
CATC CTGG AG 
GCGATGGCGA 
GGCCCCCGCG 
GCAACGTCAG 
CGTGGCTGCT 
GCGCTGGCGA 
CGGCACACTG 
GAACCGACCG 
CGCGCACACC 
CACCGCCTTC 
GGCAGTTGTA 
TGCGCCCGGC 
CGCCCCCGCC 
CGCAGGCCGC 
TCCTTCGGGA 
CGCCGCCGCC 
TGGTCGCGGC 
ATGCTCGCCG 
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32201 TCCAGGCGGC CGAGGACGAC GTACTGCCGC TGCTCGCCGG GCAGGAGGAA 

32251 CGTCTCTCCC TCGCCGCCGT CAACGGCCCC ACCGCCGTCG TCGTGTCCGG 

323 01 TGAGGCCGCT GCCGTCGGGG AGGTGGAGAA GGCGCTGCGC GGGCGCGGAC 

323 51 TGAAGACCAA GCGGCTCAAC GTCAGTCACG CCTTCCACTC GCCGCTCATC 

32401 GAGCCGATGC TCGACGACTT CCGCGAAGTG GCCCGCGGGC TGACCTTCCA 

32451 CGCGCCGACG CTGCCCGTCG TCTCCAACCT CACCGGCCGC CTCGCCGACG 

32501 CGGAGCTGAT GGCCGACGCC GAGTACTGGG TGCGGCACGT ACGCCGGCCG 

32551 GTGCGGTTCC ACGACGGGCT GCGCGCTCTC AG CGAGC AAG GCGTCGTGCG 

326 01 CTACCTGGAG TTGGGGCCCG ACCCGGTCCT CGCCACCATG GTCCAGGACG 

32651 GTCTCCCGGC CCCGGCGGAG GGAGAGGAGC CCGAGCCGGT CGTCGCCGCG 

32701 GCGCTGCGCT CCAAGCACGA CGAGGGACGC ACCCTGCTGG GTGCCGTCGC 

32751 CGCGCTCCAC ACCGACGGAC AGCCGGCCGA CCTCACCGCC CTCTTCCCCG 

328 01 CCGACGCCGG GCAAGTGCCG CTCCCCACCT ACCGGTTCCA GCGGCGACGG 

328 51 TACTGGCGCG TCGCGCCCGA CGCCGCCGCG CCGGCCCGCG CCGCCGGCCT 

32901 CCAGGAGACC GGCCACCCGC TGCTGCCCGC CGTCATCCGG CAGGCCGACG 

32 951 GCGGCATCCT GCTCGCGGGA CGCCTGTCCC TGCGTACGCA TCCATGGCTC 

33 001 GCCGACCACA CCATCGCGGG CGGCGTCCCG CTGCCCGCCA CCGCCTTCGT 
3 3 051 CGAACTCGCC CTGCTCGCAG GGCGGCACGC CGCCTGCGAC ACGATCGACG 
33101 ATCTGACGCT GGAGACGCCG CTGCTGCTCG ACGACACCGG TACCGGTGTC 
33151 GGGGCGGCTG TGGGCGCGGG CGCCGATGCC CTCGTCGATG CCATAGAAGT 
3 3201 GCAGCTTGCC CTCGGCGCTC CCGACGGTTC CGGCCGCCGT GCTCTCACCG 
33251 TCCACTCCCG TCCTGCCGAC GATGCGGCTG ACGACGGCGA CGCGGCCGAC 
333 01 GCGGCCGATG CGGCAGGCCG GGGAGGCCCG GGCGGCTCGG GTGACCTGGG 
33351 CGATCCTGGC GATCCGGGCG ATCTGGGCGA CGGCGGGGGC TCCCGCGGCT 
33401 GGCGCCGTCA CGCCACCGGC ATCCTCAGCG CCGGCCCGGC CGCCGAACCG 
33451 GCCGCCCCCG ACGCCGCTCC CTGGCCGCCG GCCGACGCCA CCGCCCTCGA 
33501 CGTCGACGCG CTGTACGCCC GGCTCGACGC GCAGGGCTAC AG CTACGGGC 
33551 CCGCCTTCCG GGCCGTCCAC GCCGCCTGGC GGCACGGCGA CGACCTCTAC 
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33 6 01 GCCGATGTCC GCCTCGCCGA CGAACAGCGC GCTGAAGCCG ACGCGTTCGC 

33651 CCTCCACCCG GCCCTGCTCG ACGCCGCCCT GCATGCCGTC GACGAGCTGT 

337 01 ACCGCGGCAG TGAGGGGCGG GGG C AGG AG C AGGGGCAGGG TGGTCAGGAG 

33 751 CCGGAGCAGG GCCGTGGCGA CGCGGACGCC CCGGTACGGC TGCCGTTCTC 

33801 CTTCAGCGAC ATACGCCACC ACGCCACCGG GGCCACACGG CTGTGGGTCC 

33851 GCCTCAGCCC CCAGGGCGAC GATCGGCTGC GGCTGTCCCT GACCGACGGC 

33 901 GAGGGCGGGC AGGTCGCGAC AGTCGACGCC CTCCAACTGC GGTTGATCCC 

33 951 CGCCGACCGG TGGCGCGCGG CCCGCCCCAC CACAGCCGCC CCCCTGTACC 

34 001 ACCTGGACTG GCACGAGCTG CCGTTGCCCG AGCCGGCCGA GACGGACCCG 
34 051 GCCGCCCACT CCTGGGCTGT GCTCGGAGCG CACGACGCGG GCCTCGCTCC 
34101 CGCCGCGCAC TACCCGGACC TGGCGGCCCT GAAAGCCGCC GTCGAGGCCG 
34151 GCGAGCCCGT GCCGGACATC GTCTTCGCAC CGTTCCCCGC GCAGGGGACG 

342 01 GAGACCGATG TCCCGGCTCA GGTACGAGCC CACGCCCGGC ACGCCCTGGA 
34251 GCTGCTGCGC GACTGGCTCA CCACGGAAGC TTTCGCCGCC GCCCGCCTCG 

343 01 TCGTCCTCAC GACCGGTGCG GTCACCGCCC GCCCAGAGGA CGGGCCCGCC 
34351 GACCTGGCCA CCGCACCTGT ATGGGGCCTG GTCCGAGCCG CCCAGGCCGA 

344 01 ACAACCCGAC CATGTCGTCC TGGTGGACAT CGACAAGGAC ATCGATAAGG 
34451 ACACCGACGA GGAGACCGAC CAGGCCACCG ACGCGGGCAC CGCATCGCGC 
34501 CACGCTCTGC CCGCCGCCTT GGCCGCGGCG GCCGCCCAAG CCGAGACACA 
34 551 GCTCGCCCTG CGCGCGGGCA CCGTGCTCGT GCCGCGCCTC GCCGTCGTCC 
34601 CGCCCCGGAC CGACACCCCA GCGCTGCACG CCACCGCCCC GGAGAGCACC 
34 651 ACGGACACTG TGGACTCCAC GGGCATCGCG GGCGCTGCGG AATCCGGCGG 
34701 CACCGTCCTG ATCACCGGCG GAACCGGCGG CCTCGGGCAG GCCGTCGCCC 
34751 GTCACCTCGC CGCCGCGCAT GGCGCCCGCC ACCTGCTCCT CGTCAGCCGC 
348 01 AGGGGCGACG CCGCCGAGGG CGTCGCCGAG TTGCGCGCCG ACCTCGCGGA 
348 51 CGACGGCGTC GACGTACGCG TCGCCGCCTG CGACATCACC GACCGCGACG 
34 901 CGCTGGCCGG GCTCCTCGCG GACATCCCCG CCGCGCACCC GCTCACCGCG 
34 951 GTCGTGCACA CCGCGGGCGT CATCGACGAC AGCCTCATCA CGGCGATGAC 



3 5001 CCCCGAGCGG CTCGACGCCG 

35051 ACCTGCACGA ACTCACCCGC 

3 5101 TCCTCGGGCG CCTCCGTCCT 

3 5151 CGCCAACACC TTCCTCAACA 

3 5201 TCGCCGCCAC CTCCGTGGCC 

3 5251 ATGGCCGCCC GGCTCGGCGA 

3 5301 CGTGACGGGC CTGACCGACG 

35351 TGACCGCCGA GCACCCCACG 

3 5401 CTGCGCGGCC AGGCCGCCGC 

3 5451 GGTACGCACT CCGCGCCCCA 

35501 CAGCCACCGG GTCCGCCACG 

3 5551 CGGCTCGCCC GGCTGTCCGC 

35601 CATTCGCGAG CAGATCGCGA 

35651 TCGAACTGGG CCGCGCCTTC 

3 5701 CTGGAACTCC GCAACCGCCT 

3 5751 CACCCTCGTC TTCGACCACC 

3 58 01 ACAGCCATCT CCCCGACGAG 

35851 GCCTCTGCGG AGGGCACCGC 

35901 GATCGC CATC GTCGGCATGG 

35951 CCGAGCAGCT GTGGCAGCTC 

36001 TTCCCCGAGG ACCGCGGCTG 

3 6 051 CGACCAGGTC GGCCACAGCT 

3 6101 CCGCCCGCTT CGACGCGGGC 

36151 GCCACCGACC CGCAGCAGCG 

3 6201 CGAACACGCG GGCATCGACC 

3 6251 TCATCACCGG AATCATGTAC 

36301 AAACCGGACG GCTTCGAGGG 

36351 GGCCTCCGGC CGGGTCGCGT 
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TCCTCGCACC CAAGGCCGAC GCGGCCTGGC 
GACAAGGACC TGTCGGCCTT CGTCCTCTTC 
CGGCAACGGC GGCCAGGCCA ACTACGCGGC 
CCCTCGCCGA ACACCGCCGC GCGGCCGGCC 
TGGGGCCTGT GGGAGTCCGC GTCCGGCGGC 
CGCCGACCGC GCCCGCATCC ACCGCACCGG 
AGCAGGCCCT GGCCCTCTTC GACGCGGCCC 
GTCCTCGCCA CCCGCTTCGA CCGCGCCGTG 
CCGCACCCTG CAGCCCGCCC TGCGCGGCCT 
CCGCGTCCGC CGGGGCCATC GGGTCCACCG 
GACGAGAACG CGCCCTCCTC GTGGGCCGCC 
CGCCGACCGC GACCGCGCCC TCAACGAACT 
CCGTCCTGGC ACACCCCTCA CCCGACACCA 
CAGGAGTTGG GCTTCGACTC GCTCACCGCC 
CTCCACGGCC ACCGGCATCC GGCTGCCCGC 
CGAGCCCCAC CGCCCTCGTA CGCCATCTCC 
GCCCAGCACA CGTCCCCGAC CGCCCCCGGC 
CGCCACGGCC ACCGGCATCG ACGACGACCC 
CGTGCCGCTA CCCGGGCGGC GTGACCTCGC 
GTGGCCACCG GCACCGACGC CATCGGCCCG 
GGACACGGCC GGACTGTTCG ATCCCGACCC 
ACACCCGCGA AGGCGGCTTC CTCTACGACG 
TTCTTCGGCA TCAGCCCGCG CGAGGCCGCC 
CCTGCTCCTG GAAACCGCCT GGCAGGCGTT 
CCGCCGCCCT GCGCGGCACC CCGTGCGGCG 
GACGACTACG GATCCCGCTT CCTCGCGCGC 
CCGCATCATG ACCGGCAGCA CGCCGAGCGT 
ACACCTTCGG CCTGGAGGGC CCCGCCATCA 
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364 01 CGGTGGACAC CGCGTGCTCC TCCTCGCTGG TCGCGATGCA CCTGGCGGCG 
36451 CAGGCGCTGC GGCAGGGCGA GTGCGAACTG GCCCTGGCCG GGGGTGTGAC 

365 01 CGTGATGGCC ACCCCGAACA CCTTCGTGGA GTTCTCCCGC CAGCGCGGCC 
36551 TGGCCCCCGA CGGCCGCTGC AAGCCGTTCG CCGCCGCGGC GGACGGCACC 

366 01 GGCTGGGGCG AGGGCGCCGG ACTCGTCGTC CTGGAGCGCC TCTCCGACGC 
36651 GCGCCGCAAG GGACACCGCG TCCTCGCCCT GCTGCGCGGT TCGGCCGTGA 
36701 ACCAGGACGG CGCGAGCAAC GGCATGACCG CCCCGAACGG TCCCTCGCAG 
36751 GAACGGGTCA TCCGCACCGC CCTGGCCGGC GCGGGCCGTG GTCCCGAGGA 
368 01 CATCGACGTG GTGGAGGCGC ACGGCACCGG CACCACGCTC GGCGACCCGA 
368 51 TCGAGGCGCA GGCCCTGCTC GCCACGTACG GGCAGGGGCG CCCGGAGGAC 
36901 CGCCCGCTCT GGCTCGGCTC GGTGAAGTCG AACATCGGCC ACACGCAGGC 
36951 CGCCGCCGGT GTCGCGGGCG TCATCAAGAT GGTCATGGCA CTGCGCCACG 
37001 AGCAACTGCC CACGACCCTG CACGCCGACG AGCCGACCCC CCACGTGCAA 
37051 TGGGACGGCG GCGGCGTACG TCTCCTGACC GAACCGGTCC CGTGGTCGCG 
37101 CGGCGAGCGC ACGCGGCGCG CCGGGGTGTC GTCCTTCGGG ATCTCCGGGA 
37151 CGAACGCGCA CCTGATCCTG GAGGAGCCGC CGGAGGAGGA CCTGCCCGAG 
37201 CCCGTGGCGG CGGAGCCGGG TGGGGTGGTG CCGTGGGTGG TGTCCGGGCG 
37251 GACGCCGGAC GCGTTGCGTG AACAGGCGCG GCGGCTCGGC GAGTTTGTCG 
37301 TCGGTGCCGG GGATGTGTCG GCAGCCGAGG TGGGATGGTC ACTGGCCACG 
37351 ACGCGGTCGG TGTTCGAGCA CCGGGCCGTG GTGGCGGGCC GGGACCGGGA 
37401 CGATCTGGTT GCCGGGATGC AGGCGCTGGC GGCAGGGGAG ACGCCGACAG 
37451 ATGTCGTGTC CGGTGCGGCG GCTTCCTCCG GTGCGGGGCC GGTGTTGGTG 
37501 TTCCCGGGGC AGGGGTCGCA GTGGGTGGGC ATGGGTGCCC AGCTCCTTGA 
37551 CGAGTCCCCC GTCTTCGCGG CGCGGATCGC GGAGTGTGAG CAGGCGCTGT 
37601 CGGCGTACGT GGACTGGTCG CTGAGTGATG TCCTGCGCGG GGACGGGAGT 
37651 GAGCTGTCCC GGGTCGAGGT CGTGCAGCCC GTGTTGTGGG CGGTAATGGT 
37701 CTCGCTGGCT GCCGTCTGGG CGGATTACGG GGTCACTCCG GCCGCTGTGG 
37751 TGGGGCATTC GCAGGGTGAG ATGGCTGCCG CGTGTGTGGC GGGGGCGCTG 



37801 
37851 
37901 
37951 
38001 
38051 
38101 
38151 
38201 
38251 
38301 
38351 
38401 
38451 
38501 
38551 
38601 
38651 
38701 
38751 
38801 
38851 
38901 
38951 
39001 
39051 
39101 
39151 



TCGCTGGAGG 

TCAGCTGCAA 

AGGCCGCTGA 

AACGGGCCGT 

TGTGGTCGCG 

TCGGGTATGC 

ACCGAGGGCC 

TTCGACGGTC 

ATTACTGGGT 

GAAGCGCTTC 

CCCGGTGTTG 

CTGCCACGGT 

CTCACCCGCG 

GCGACGCTGG 

CCTACGCCTT 

ACCGGAGACG 

CGGTGCCTGT 

GGCTCTCGCG 

ACGGTTCTGG 

CGATGAGGCG 

TGGTGCTGCC 

AC CGATGAG C 

GCAGGACGCG 

CCGAAATGCC 

GGGGCCGAAG 

CGGATACGCC 

ACGGGACGGA 

CACGACGGTT 



ATGCGGCGCG 
GGGCACGGCG 
GCTGATCGGT 
CGTCTACCGT 
GAGGCGGAGG 
CTCGCACGGC 
TGGCTGACAT 
ACCGCCGAGC 
GACCAACCTC 
TCGCGGACGG 
GGCCTGGGCA 
CGTCCCCACC 
CCGCCGCCCA 
TTCCCGGCCG 
CCAGCACCAG 
CCGCCGACCT 
GTGGAACTCG 
CAGGGCTCCG 
TGCCGGGTGC 
GGATGCCCGA 
CGAGTCGGGC 
AGAGCGGCCG 
TCGGCGGTGT 
AGAAGCGGCA 
C CGTGG ATGT 
TACGGTCCGG 
GCTGTTCGCC 
TCGGCATCCA 



GATTGTGGCG 
ACATGGCCTC 
GATCGGCCGG 
GATTTCGGGG 
CACGTGGTCT 
CCCCAGATCG 
CCGGCCCGCG 
GCCTGACGGA 
CGCCAGCCGG 
CTATCGCCTG 
TGGAGGAGAC 
CTGCGCCGCG 
CGCCTTCACC 
ACCCCACCCC 
CACTACTGGC 
CGGCATGGTG 
CGGAGAGCGA 
TCCTGGCTGG 
GGCGTTGGTG 
CGATTGAGGA 
GGGTTGCAGG 
TCGTGACGTA 
GGGTGTGCCA 
GCCGAGTTGA 
CGAGGACTTC 
CGTTCCAGGG 
GAGGTGGTGC 
CCCGGCGCTG 



GTACGCAGTG 
ACTCGGCACT 
GAGTGGTCGT 
CCGCCGGAGC 
GCGCGCCCGT 
ACCAGCTCCA 
AACACGGACG 
CACCACAGCC 
TCCGGTTCGC 
TTCATCGAGG 
CATCGAGCAG 
ACCACGGCGA 
GCCGGCGCCG 
CCGTACCGTC 
TGGAGGAGCC 
GCCGCCGGGC 
CTCGTACTTG 
CCGAACACGT 
GAGTGGGTGC 
ACTGACGCTC 
TTCAGGTGGT 
CACGTGTATT 
TGCCGTCGGT 
GTGGGCAGTG 
TACGCGCGGG 
GCTGCGGGCG 
TGCCCGAACA 
CTGGACGCCG 



ACGCGCTTCG 
GGTGCCGAGC 
CGCGGCAGTC 
ATGTGGCCGC 
GTGATCGACG 
CGACCTCCTC 
TGGCCTTCTA 
CTGGATACGG 
CGACACCATC 
CCAGCGCGCA 
GCGGACATCC 
CACCACCCAG 
ATGTCGACTG 
GACCTCCCCA 
CAGTGGGCTC 
ATCCGCTGCT 
TTCACCGGGC 
GGTGGCGGGG 
TGCGGGCCGG 
CAGGCGCCGT 
CGTGGGTGCG 
CGAGGTCTGA 
GTGGTGAGCT 
GCCTCCTGCC 
CCGCGGAGGC 
CTGTGGCGGC 
GGCGGGTGGG 
CCCTGCATCC 
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3 9201 GCTGATGCTC CTCGACCGGC CCGCGGACGG GCAGATGTGG CTGCCGTTCG 

3 9251 CGTGGAGCGG GGTGTCGCTG AACGCGGACC GGGCGACCCA CGTCCGTGTC 

3 93 01 CGGCTCTCCC CGCGGGGGGA GGCGGCCGAG CGTGACCTGC GGGTCGTCAT 

393 51 CGCCGACGCG ACCGGCGCGC CCGTCCTGAC GGTCGACGCC CTGACCCTGC 

3 9401 GCGCGGCCGA TCCCGGCCGG CTGGGTGCGG CGGCCCGTGG CGGTGTCGAC 

3 9451 GGCCTCTACA CCGTCGACTG GACCCCGCTG CCCCTGCCCC AGCCCCTTCC 

3 9501 GCTGCCGCGG ACGGATG C AG GGGGGAGTGC CGACTGGGTC ATACTCTCGG 

3 9551 ACAACTC CAG TGCAGCTCTG GCTGATGCCG TGTCGTCCGC GACGGCGGCA 

3 9601 GGTGGCGGAG CGCCGTGGGC ATTGCTCGCT CCCGTGGGTG GCGGCTCTGC 

3 9651 CGATGACGGG CTGCCGGTGG TGCGGCGGAC CCTCTCCCTC GTACAGGAGT 

3 9701 TCCTGGCCGC CCCGGAGCTG ACCGAGTCCC GTCTCGTCAT CGTGACACGC 

3 9751 GGTGCCGTGG CCACCGACGC CGATGGTGAC GTCGCGGCGT CCGCGGCAGC 

3 9801 GGTATGGGGC CTGATCCGCA GCGCCCAGTC GGAGAAC C CG GGCCGCTTCG 

3 9851 TCCTGCTCGA CGTCGAGGAG GAGCACCTCC ACCCGGACGG CGGGGAACTG 

3 9 901 CCGTACGCCG CCCTGCGCCA CG C CGTAGAG GAGCTCGACG AGCCTCAACT 

3 9951 TGCCCTCCGC AG CGGCAAAT TCCTCGTACC GCGCATGACG CCCGCCGCCG 

4 0001 CCCCCGAGGA GCTCGTCCCG CCGGTCGGTA CGTCCGGCTG GCGCCTCGGC 
4 0051 ACCTCCGGTA CGGCCACCCT GGAGAATCTG TCGGTGATCG ACGCTCCCGA 
4 0101 GGCGTTCGCG CCGCTGGAGC CCGGGCAGGT GCGGATCTCC GTACGGGCGG 
4 0151 CGGGCATGAA CTTCCGTGAC GTGCTGATCG CGTTGGGCAT GTATCCCGAC 
4 0201 AAGGGCACGT TCGCGGGAAG CGAGGGCGCC GGACATGTGA CGGAGGTGGG 
4 0251 ACCGGGCGTC ACTCATCTGT CGGTCGGTGA CCGGGTGATG GGTCTGTTCG 
4 0301 AGGGCGCGTT CGCTCCGCTG GCCGTCGCGG ACGCCCGGAT GGTCGTCCCG 
4 0351 ATTCCGGAGG GCTGGAGCTT CCAGGAGGCC GCGGCGGTGC CCGTGGTGTT 
4 0401 CCTCACGGCC TGGTACGGCC TCGTGGACCT CGGCCGCCTC CGGGCGGGCG 
4 0451 AATCGCTGCT CATCCACGCG GGCACCGGCG GAGTGGGCAT GGCCGCCACC 
4 0501 CAGATCGCCC GCCACCTGGG CGCCGAGGTG TTCGCCACCG CGAGCCCCGC 
4 0551 CAAGCACGGC GTGCTCGACG GCATGGGCAT CGACGCGGCC CACCGCGCCT 



4 06 01 CCTCCCGTGA CCTCGACTTC 

4 0651 CGCGGCATGG ACGTCGTACT 

4 0701 CTCGCTGCGG CTGCTCGCCG 

4 0 751 CCGACAAGCG CGACCCCGAC 

4 0801 TACCGGGCCT TCGACCTCGT 

4 0851 AATGCTGGCG GAGCTGGGCG 

4 0 901 TGCCCGTCCA GACCTGGCCG 

4 0 951 ATGAGCCAGG CGAAGCACAC 

41001 CCTCGATCCG GACGGCACGG 

41051 CCGCCGCGGT GGCCGAGCAT 

41101 CTGCTGGCCG GGAGGCGCGG 

41151 CGAGGAACTG AC CGAGTTGG 

41201 TCAGTGATCC GGACGCCGTG 

41251 CACCCGCTGA CCGGTGTGAT 

413 01 GGTCACCGCA CAGACCCCGG 

413 51 CGACGGCCGC ACACCTGCTG 

414 01 CTCTTCCTGG TGTTCTCCTC 
414 51 GGCCAACTAC GCGGCGGCCA 
41501 GGCGTGCCGA GGGCCTGGCC 
41551 ACGGCGAGCG GCATGACCGG 
41601 GAAGCGCACC GGGTTCACCC 
41651 TCGACGCCGC CCGCGCCCAC 
41701 GACGCGCGCG CCGTCGCCGC 
41751 GCGCGCCCTG GCCGCGGGTG 
418 01 CCGCGGCCGC GGGCAGCGTC 
418 51 GCCGGCCTGC CGCATCCCGA 
41901 TGGCAACGTC GCCGGCGTCC 
41951 CGGACACGTC GTTCAAGGAG 
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GAGGAG AC CT TGCGGGCGGC GACGGGCGGG 
CAACAGTCTG GCCGGGGAGT TCACCGACGC 
AGGGCGGGCG CATGGTGGAC ATGGGCAAGA 
CGGGTCGCGG CCGAGCACGC GGGCGCGTGG 
GCCGCACGCG GGGCCCGACC GGATCGGGGA 
AGTTGTTCGC CTCCGGCGCC CTGGCGCCGC 
CTGGGCCGGG CGCGTGAGGC GTTCCGGTTC 
CGGCAAGCTG GTGCTGGAGA TCCCGCCCGC 
TGCTCATCAC CGGCGGCACC GGGGTCCTCG 
CTGGTGAGGG AGTGGGGCGT ACGACACCTG 
TTCCGAGGCG CCCGGGAGCA GTGAACTCGC 
GGGCCGAGGT GACCTTTGCC GCGGCCGATG 
GCGGAGCTCG TCGGCAAGAC CGATCCGGCG 
CCACGCGGCC GGTGTGCTGG ACGACGCCGT 
AGAGCCTCGC GCGGGTGTGG GCGGCGAAGG 
CACGAGGCGA CCCGGGAGGC GCGCCTCGGT 
GGCGGCGGCG ACACTCGGCA GTCCGGGACA 
ACGCCTATTG CGACGCCCTC GTCCGGCAAC 
GGTCTCTCGA TCGGCTGGGG TCTGTGGCAG 
ACACCTCGGC GAGACGGACC TGGCACGCAT 
CGCTGACCAC CGAAGGTGGC TTGGCCCTCC 
GGCCGCCCGC ACGTGGTCGC GGTGGACCTC 
GCAGCCCGCC CCGTCCCGGC CCGCGCTCCT 
CGACCCCGGG GGCCCGCACC GCCCGGCGCA 
GCCCCGGCGG GCGGTCTCGC CGACCGGCTC 
ACGGCGCCGG CTGCTGCTCG ACCTCGTACG 
TCGGGCACAG CGACCACGAC GCCGTCCGCC 
CTCGGCTTCG ACTCCCTGAC CGCCGTGGAA 



42 001 CTGCGCAACC GGCTGGCCGC 

42 051 CGTCTTCGAC TACCCCGAGT 

4 2101 GTCTGTCGCC CGACGGCGCG 

42151 GTTCTCAACG ACCTCGGCAG 

4 2201 CGACGCGGAC GCGCGCAGCC 

4 2251 CGAAGCTGAA CGGAGCCGCC 

423 01 CTGGACGCGC TGGACGCGCT 

423 51 GTTCATCGAC CGAGAGCTGT 

4 2401 CGCCCCCACG TTCCCCGTGC 

42451 TCGAGTGCTG AAGAGTCGAG 

425 01 TACGGGAGAG TCCGCTACGG 

4 2551 AGTATCTGAA GCGGGTCACG 

42601 CGCGAGGTGG AGGAGCGGGC 

42651 GTGCCGCTTC CCCGGCGACA 

42 701 TCGCCGAGGG CGGCGACGCC 

42751 GACCTGGAGA GCCTCTACCA 

428 01 CGTCCGACGC GGCGGGTTCC 

428 51 TCTTCGGGAT CAGCCCGCGC 

42901 GTG CTCATGG AGACGGCCTG 

42 951 GGCCTCGCTG AAGCTGAGCG 

43 001 TCGGGTTCGG CGGCGCGCAG 
43 051 ACCGGCAGCG CGCTGAGTGT 
43101 CCTCGAGGGC CCGTCGGTCA 
4 3151 TCTCCATGCA CCTGGCGGCC 

432 01 GCGCTGGCCG GCGGTGTCAC 
4 3251 GTTCTCCCGC CAGGGCGCGC 

433 01 CGGCCTCGGC CGACGGCACC 
4 3 351 CTGGAGCGGC TCTCCGACGC 
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CGCCACCGGC CTGAAGCTGC CCGCGGCGCT 
CGGCCACCCT CGTCGACCAC CTCCTGGAGC 
CCGCCGCCCG TCAAGGACGC CGCGGACCCC 
GATCGAGTCC TCCCTGGACG CGCTCGCCCT 
GGGTCACCAG GCGTCTGAAC ACCCTGCTGT 
ACCGCCGGCT CCCCGGCGGA CGTCACGGAC 
GGACGACGTG TCCGACGACG AGATGTTCGA 
GACCCCCCTG CCCGCCCCGT CCCCCTTCCC 
CCTTCGCTGA TGGAGAAGTG ACGTTCGATG 
TCCTGATGTG TCCGGCACGG GTGTGTCCGG 
GTACGTCGAG TACGGAAGCC AAGCTTCGGC 
GTGGACCTCG GCCAGGCCCG CCGGCGGCTG 
CCAGGAGCCG ATCGCCATCG TCTCCATGGC 
CCCGCACGCC CGAGGCCCTG TGGGACCTGG 
ATCGACGACT TCCCCACCAA TCGCGGCTGG 
CCCCGACCCC GACCACCCCG GCACCAGCTA 
TGTACGACGC CCCCGCCTTC GACGCGTCGT 
GAAGCCCTGG CCATGGACCC GCAGCAGCGG 
GCAGCTCCTG GAGCGGGCCG GCATCGACCC 
CCACCGGCGT CTACATCGGC GCGGGCGTGC 
CCCGACAAGA CGGTAGAGGG CCACCTCCTG 
CCTGTCCGGC CGCATCTCCT TCACGCTCGG 
GTGTCGACAC GGCGTGCTCC TCCTCGCTGG 
CAGGCGCTGC GGCAGGGGGA GTGCGATCTC 
CGTGATGTCG ACGCCCGGCG CGTTCACCGA 
TGTCTCCGGA CGGCCGCTCG AAGGCTTTCG 
GGTTTCTCGG AGGGCGCGGG ACTGCTCCTC 
GCGCCGCAAC GGCCACAAGG TGCTCGCGGT 
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4 3401 GATCCGCGGC TCGGCCGTCA ACCAGGACGG CGCGAGCAAC GGTCTCACCG 

4 3451 CCCCCAACGG CCCCTCCCAG GAACGCGTGA TCCGCGCCGC CCTCGCCAAC 

4 3 501 GCGGGCCTGG GCGCCGCCGA GGTCGACGCG GTCGAGGCAC ACGGCACCGG 

4 3 551 CACGAAGCTC GGCGACCCCA TCGAGGCCGG TGCGCTGCTC GCCACCTACG 

4 3601 GCCGCGACAG GGACGAGGAC CGGCCGCTGT GGCTGGGCTC GGTCAAGTCG 

4 3 651 AACATCGGTC ACCCGCAGGG CGCAGCAGGC GTCGCGGGCG TCATCAAGAT 

4 3 701 GGTGATGGCG CTGCAGCGCG AACTGCTCCC CGCCACCCTG TACGTCGACG 

4 3 751 AGCCCACCCC GCACGTCGAC TGGTCCTCGG GCTCCGTCAG GCTCCTCACC 

4 38 01 GAACCGGTCC CGTGGACCCG CGGCGAGCGC CCGCGCCGCG CGGGCGTGTC 

4 3851 CGCCTTCGGC ATGTCCGGGA CGAACGCCCA CGTGATCCTG GAGGAGGCAC 

4 3 901 CGCCCGAGGA GGCAGCGGCC GCGGAGACAC CGGCGGAAGG GACAGGCGCA 

4 3 951 GTCGTCCCGT GGGTCGTCTC CGGCCGGGGC GAGGAAGCGC TGCGGGCCCA 

44 001 GGCCGCACAG CTCGCCGAGC ACGTGCGCGA CGACGACCAG CGGCCGGCGT 

4 4 051 CACCGCTGGA GGTGGGGTGG TCGCTCGCCA CGACACGGTC GGTGTTCGAG 

44101 AACCGGGCCG TCGTCGTCGG GGACGACCGC GACGCGCTCC TCGACGGCCT 

44151 CCGGTCGCTG GCGGCAGGTG AGGCGTCGCC GGACGTGGTG TCCGGGGCGG 

442 01 TCGGCCCCAC GGGGCCCGGG CCGGTCATGG TGTTCCCCGG CCAGGGCGGC 

442 51 CAGTGGGTGG GCATGGGGGC CCGGCTCCTC GACGAGTCCC CGGTGTTCGC 

443 01 GGCCCGGATC GCCGAGTGCG AGCAGGCCCT GTCGGCGTAC GTGGACTGGT 
44 3 51 CCCTGACCGA CGTGCTGCGC GGGGACGGGT CGGAGCTGGC CCGGATCGAC 

444 01 GTCGTCCAGC CCGTGCTGTG GGCCGTCATG GTCGCGCTCG CCGCCGTCTG 
444 51 GGCGGACCAG GGAATCGAAC CCGCCGCCGT CGTCGGCCAC TCGCAGGGCG 
44 501 AGATAGCCGC GGCGTGCGTC GTGGGCGCCA TCTCCCTGGA CGAGGCGGCC 
44 551 CGCATCGTCG CCGTACGCAG TGTGCTGCTG CGGCAGCTGT CCGGACGCGG 
446 01 CGG CATGGCG TCCCTGGGGA TGGGCCAGGA GCAGGCCGCC GACCTGATCG 

446 51 ACGGACACCC GGGTGTGGTC GTCGCGGCCG TCAACGGGCC GTCGTCCACC 
4 47 01 GTCATCTCGG GCCCGCCCGA GGGCATCGCC GCCGTCGTCG CCGACGCCCA 

447 51 GGAGCGGGGC CTTCGCGCCA GGGCCGTCGC CTCCGACGTC GCGGGCCACG 
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44801 


GCCCGCAGCT 


GGACGCGATC 


CTGGACCAGC 


TCACGGAGGG 


CCTGGCCGGC 


44851 


ATCCGGCCCG 


CCGCGACCGA 


CGTCGCGTTC 


TACTCCACCG 


TCACCGCCGG 


44901 


GCACCTCACC 


GACACCACCG 


AACTCGACAC 


CGCGTACTGG 


GTGCGGAACG 


44951 


TGCGCCGGAC 


GGTGCGTTTC 


GCCGACACGA 


TCGACGCGCT 


GCTCGCGGAC 


45001 


GGGTACCGCC 


TGTTCATCGA 


GGTGAGCCCC 


CACCCCGTCC 


TCAACCTCGC 


45051 


GCTGGAAGGC 


CTCATCGAAC 


GGGCGGCCGT 


GCCCGCCACG 


GTCGTGCCCA 


45101 


CCCTGCGCCG 


CGACCACGGC 


GACACCACCC 


AGCTCGCCCG 


CGCCGCGGCC 


45151 


CACGCCTTCG 


CCGCCGGCGC 


GGACGTCGAC 


TGGCGGCGCT 


GGTTCCCGGC 


45201 


CGACCCCGCC 


CCCCGTACCG 


TCGACCTGCC 


CACCTACGCC 


TTCCAGCGCC 


45251 


AGGACTTCTG 


GCCGGCCCCC 


GCCGGCGGGC 


GGTCCGGCGA 


CCCTGCCGGG 


45301 


CTCGGCCTCG 


CCGCCTCCGG 


ACACCCGCTC 


CTGGGCGCCT 


CCGTGGGCCT 


45351 


CGCGAGCGGG 


GACGTACACC 


TGCTGAGCGG 


GCGGGTGTCC 


CGGCAGTCCG 


45401 


CCGCGTGGCT 


GGACGACCAC 


GTCGTGGCGG 


GCCAGGCCCT 


GGTGCCCGGC 


45451 


GCGGCGCAGG 


TGGAGTGGGT 


GCTGCGGGCC 


GGCGACGACG 


CGGGCTGCTC 


45501 


CGCCCTGGAG 


GAGCTGACGC 


TCCAGACGCC 


GCTCGTGCTG 


CCCGACACCG 


45551 


GCGGCCTGCG 


GATCCAGGTC 


GTCGTCGAAG 


CGGCCGACGC 


ACACGGCCGG 


45601 


CGCGACGTCC 


GGCTGTTCTC 


CCGCCCCGAT 


GACGACGACG 


CCTTCGCGTC 


45651 


GACGCACCCC 


TGGACCTGCC 


ACGCCACGGG 


CGTGCTCGCC 


CCCGCCCCGA 


45701 


CGGACGG CAC 


CAACGGAACG 


CGGGACGCCG 


CCGACACCCT 


GGACGGCGCA 


45751 


TGGCCCCCGG 


CCGACGCCGA 


ACCCGTCCCC 


GCCGACGACC 


TCTACGCGCA 


45801 


GGCCGACCGC 


ACCGGATACG 


GCTACGGCCC 


CGCCTTCCGG 


GGCGTACGGG 


45851 


CGCTGTGGCG 


CCACGGCAAG 


GACGTCCTGG 


CCGAGGTGAC 


GCTGCCCAAG 


45901 


GAGGCCGGCG 


ACCCGGACGG 


CTTCGGTATC 


CACCCGGCCC 


TCCTCGACGC 


45951 


CGTCCTGCAA 


CCCGCCGCAC 


TGCTGCTGCC 


CCCGACCGAC 


GCCGAACAGG 


46001 


TCTGGCTGCC 


GTTCGCCTGG 


AACGACGTGG 


CGCTGCACGC 


CGTACGGGCC 


46051 


ACCACGGTCC 


GGGTGCGCCT 


CACCCCGCTC 


GGCGAGCGGA 


TCGACCAGGG 


46101 


GCTGCGCATC 


ACCGTGGCCG 


ACGCCGTGGG 


CGCGCCCGTG 


CTCACCGTCC 


46151 


GCGACCTGCG 


CTCGCGCCCG 


ACCGACACAG 


GCCGCCTCGC 


CGCGGCCGCG 
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4 6201 ACCCGCGACC GGCACGGGCT GTTCGACCTG GAGTGGATCG CGCCGGAGAA 

4 6251 CGCGGCGGAG AACGCGGCGG GTCCGGCCCG GGACGCGTCC GAAGGGTGGG 

463 01 TGACACTCGG CGAGGACGCC GCGAGCCTCG CGGACCTGCT GGCGTCCGTC 

463 51 GAGGCGGGCG CTCCGGCGCC GCAGCTCGTG GCCGCCCCCG TCGAACCCGA 

464 01 CCGGACCGAC GACGGCCTGG CACTCGCCAC CCACGTCCTC GACCTCGTAC 
4 6451 AGACCTGGCT CGCCTCGCCC CTGCACGACT CCCGCCTGGT CCTGGTGACG 
4 6 501 CGAGGGGCAG TGACGGATGC GGATGTGGAT GTGGCTGCCG CGGCCGTTTG 
4 6551 GGGTCTGGTA CGCAGCGCCC AGTCGGAGCA CCCCGGCCGC TTCACGCTGA 
4 6601 TCGACCTCGG CCCCGACGAC ACGCTTGCCG CAGCCATGCA GGCGGCGCAC 
46651 CTGGAAGAGC CGCAACTGGC GGTGCACGGC GGCGAGATAC GAGTGCCGCG 
4 6701 ACTGGTCCGC GCCACGACCG ACCCGACCGC CCCGAACGGG ACACCGGAGG 
4 6 751 CCGACCGGAC GGCGGACCCG TCCGAAGGAC TCCACCGGAA CGGTACGGTT 
4 6801 CTCATCACCG GCGGCACCGG CGTACTCGGC CGACTGGTGG CCGAACACCT 
46851 GGTCACGGAG TGGGGCGTAC GCCACCTGCT GCTCGCGAGC CGACGCGGCG 
4 6 901 ACCAGGCGCC GGGTAGCGCC GAACTCCGCG CCCGCCTGAG CGAATTGGGA 
4 6 951 GCATCGGTCG AGATCGCCCC GGCCGATGTC GGCGACGCGG AAGCGGTCGC 
4 7001 CGCACTGATC GCGTCGGTCG ACCCGGCGCA CCCGCTCACC GGTGTGATCC 
4 7051 ACGCGGCCGG TGTCCTGGAC GACGCCGTGA TCACCGCCCA GACCCCCGAG 
4 7101 AGCCTCGCGC GGGTGTGGGC GACGAAGGCG ACGGCGGCCC GCCATCTGCA 
47151 CGAGGCGACA CGGGAGACAC CCCTCGACTT CTTCGTGGTG TTCTCCTCGG 
4 7201 CGGCCGCCTC GCTCGGCAGC CCCGGCCAGG CCAACTACGC GGCGGCCAAC 
4 7251 GCCTATTGCG ACGCCCTCGT CCAGCACCGC CGCGCCCAAG GGCTCGCGGG 
4 7301 CCTCTCGATC GCCTGGGGCC TGTGGCAGGC GACCAGCGGC ATGACCGGGC 
4 7351 AGCTGAGCGA GACCGACCTG GCGCGCATGA AGCGCACCGG GTTCGCCGCG 
4 7401 CTGACCGACG AGGGCGGCCT GGCCCTGCTC GACGCCGCCC GTGCCCACGA 
4 7451 CCGGGCCTAC GTGGTCGCGG CCGACCTCGA CCCGCGCGCC GTGACCGATG 
4 7501 GCCTGTCCCC GCTCCTGCGC GCCCTCACGG CGCCCGCCAC GCGGCGGCGC 
47551 GTGGCCTCCG AAGGCCTCGC CGACGGGGCG CTCGCGACCC GCCTGGCCGG 
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47601 CCTCGACGCG GACGGCCGCC TAAGGCTCCT CACCGATGTC GTACGCGAGT 

4 7651 ACGTCGCGGC CGTCCTCGGC CATGGTTCCG CCGCCCGGGT GGGCGTCGAC 

4 7701 ATCGCCTTCA AGGACCTGGG TTTCGACTCG CTGACCGCGG TGGAGCTGCG 

4 7751 CAACCGGCTG TCGGCCGCCT GTGACGTGCG GCTGCCCGCC ACACTGATCT 

4 7801 TCGACCACCC CACCCCGCAG GCTCTCGCCA CCCACCTGGT GGACCGCTTG 

4 7851 GCGGGCAGCA CCTCCGCGAC CACGACGGTG AATGCGACGG CGCCGGCAGC 

47901 CGCCCACGTC GCCGCAGGGG CCGACGTCGA CGCAGACACC GACGACCCGG 

4 7951 TCGCCATCGT CGCCATGACG TGCCGGTTCC CGGGCGGCGT CGCGTCCCCG 

4 8 001 GACGACCTGT GGGACCTGCT CGACGCACGC AAGGACGCGA TGGGCGCCTT 

4 8051 CCCCACCGAC CGCGGCTGGG ACCTGGAACG CCTCTTCCAC CCCGACCCGG 

48101 ACCACCCCGG CACCAGCTAC ACCGACCAGG GCGGATTTCT TCCCGACGCG 

4 8151 GGTGATTTCG ATGCGG CGTT CTTCGGGATC AATCCGCGGG AGGCGCTGGC 

4 8201 GATGGATCCG CAGCAGCGGT TGTTGCTGGA GGCGTCGTGG GAGGTGTTGG 

4 8251 AGCGTGCGGG TATCGATCCG ACGACGCTCA AGGGCACCCC GACCGGCACC 

48301 TACGTGGGCC TCATGTACCA CGACTACGCC AAGTCCTTCC CCACGGCCGA 

48351 CGCCCAGTTG GAGGGCTACT CCTACTTGGC GAGCACCGGC AGCATGGTCT 

4 8401 CCGGCCGCGT CGCCTACACC CTGGGCCTTG AAGGTCCGGC GGTGACGGTC 

4 8451 GACACCGCGT GCTCCTCCTC CCTGGTCTCC ATCCACCTGG CGACGCAGGC 

48501 ACTCCGGCAC GGCGAGTGCG ACCTCGCCCT GGCAGGCGGT GTGACCGTCA 

48551 TGGCCGACCC GGACATGTTC GCGGGCTTCT CGCGCCAGCG CGGCCTCTCA 

486 01 CCTGACGGCC GCTGCAAGGC CTACGCCGCC GCGGCCGACG GAGTCGGATT 

48651 CTCCGAGGGA GTGGGCGTAT TGCTCCTTGA GCGGTTGTCG GATGCGCGGC 

48701 GTCATGGGCG TCGGGTGTTG GGTGTGGTGC GGGGTTCGGC GGTGAATCAG 

4 8751 GACGGTGCGA GTAATGGGTT GACGGCGCCG AATGGTC CGT CGCAGGAGCG 

488 01 GGTGATTCGT CAGGCGTTGG CCAGTGGTGG GTTGTCGTCG GTGGATGTTG 

48851 ATGTGGTGGA GGGGCATGGG ACGGGGACCA CGTTGGGTGA TCCGATCGAG 

4 8 901 GCGCAGGCTC TGCTGGCCAC ATATGGG CAG GGGCGTCCGG AGGACCGTCC 

4 8 951 GTTGTGGTTG GGGTCGGTGA AGTCGAACAT TGGTCATACG CAGGCGGCTG 
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49001 


CGGGTGTTGC 


GGGTGTCATC 


AAGATGGTGA 


, TGGCGATGCG 


GCATGGTGTG 


49051 


GTGCCGGCGA 


GTTTGCATGT 


GGATGTGCCG 


TCGCCGCATG 


TGGAGTGGGA 


49101 


TTCGGGTGCG 


GTGCGGTTGG 


CGGTTGAGTC 


GGTGCCATGG 


CCGCAGGTGG 


49151 


AGGGTCGTCC 


GCGTCGGGCG 


GGTGTGTCGT 


CGTTCGGCGC 


TTCGGGGACG 


49201 


AATGCGCACG 


TGATCGTGGA 


GTCTGTTCCC 


GATGGGCTGG 


AGGAGGACTC 


49251 


GGTATCGGTC 


GGCGGTGAGG 


CTCTTGAGAC 


GGAGACTGAC 


GGGCGCTTGG 


49301 


TGCCGTGGGT 


GGTGTCGGCC 


CGCAGCCCGC 


AGGCCCTGCG 


CGACCAGGCA 


49351 


CTACGCCTGC 


GTGACTTTGC 


CAGTGACGCG 


TCGTTCCGCG 


CGCCGCTCGC 


49401 


CGACGTGGGC 


TGGTCGCTGC 


TGAAGACGCG 


TGCGCTGCAT 


GAGCATCGCG 


49451 


CCGTTGTGGT 


GGGCGCGGAG 


CGGGCAGAGC 


TGATCGCCGC 


TCTGGAGGCG 


49501 


CTGGCGACGG 


GTGAGCCGCA 


TGCGGCGCTG 


GTCGGCCCGG 


CTTGCTCGCA 


49551 


GGCTCGGGTG 


GGTGGCGATG 


ACGTGGTGTG 


GCTGTTCAGT 


GGTCAGGGCA 


49601 


GTCAGTTGGT 


CGGTATGGGT 


GCTGGTTTGT 


ATGAGCGGTT 


CCCGGTGTTT 


49651 


GCGGCTGCGT 


TTGATGAGGT 


GTGCGGCCTG 


TTGGAGGGGC 


CGTTGGGCGT 


49701 


GGAGGCGGGT 


GGGTTGCGGG 


AGGTGGTGTT 


CCGTGGCCCG 


CGGGAGCGGT 


49751 


TGGATCACAC 


GGTGTGGGCG 


CAGGCGGGGT 


TGTTTGCGCT 


GCAGGTGGGG 


49801 


TTGGCCCGGT 


TGTGGGAGTC 


GGTCGGGGTG 


CGGCCGGATG 


TGGTGCTCGG 


49851 


GCATTCGATC 


GGTGAGATCG 


CGGCCGCGCA 


TGTGGCGGGG 


GTTTTTGATC 


49901 


TGGCGGATGC 


GTGTCGGGTG 


GTGGGTGCGC 


GGGCGCGTTT 


GATGGGTGGG 


49951 


CTGCCTGAGG 


GTGGGGCGAT 


GTGCGCGGTG 


CAGGCCACGC 


CCGCCGAGCT 


50001 


GGCCGCCGAC 


GTGGACGGAT 


CGGCTGTAAG 


TGTGGCGGCA 


GTCAACACCC 


50051 


CCGACTCCAC 


GGTGATTTCG 


GGCCCGTCGG 


ACGAGGTGGA 


CCGGATTGCT 


50101 


GGGGTGTGGC 


GGGAGCGTGG 


GCGCAAGACG 


AAGGCGCTGA 


GCGTCAGTCA 


50151 


TGCCTTCCAT 


TCGGCGTTGA 


TGGAGCCGAT 


GCTCGCGGAG 


TTCACCGAAG 


502 0 1 
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CGATCCCGCT 


C ATG AG C Att.T 


50251 


GTCTCCGGAG 


AGCGGGCCGG 


CGAGGAGATC 


ACGGATCCGG 


AGTACTGGGC 


50301 


GAGGCATGTA 


CGTAATGCGG 


TGCTCTTCCA 


GCCCGCCATC 


GCCCAAGTAG 


50351 


CGGATTCAGC 


GGGCGTGTTT 


GTGGAGCTCG 


GCCCCGCGCC 


TGTGCTGACC 



5 0401 ACGGCCGCCC AGCACACCCT 

5 0451 GGTCGCGTCT CTCGCCGGTG 

5 0501 CGATGGCTCG TCTGCATACC 

50551 TTCGCGGGTG ATCGTGTGCC 

5 0601 CCAGCGGGAG CGGTTCTGGT 

506 51 CGACTTTGGG GTTGGTGGCG 

5 0701 GAGTTCGCGG ACCGGGGTGG 

5 0751 TGGGGTGTCG TGGCTTGCTG 

50801 CGGGTGCTGC GTTGGTGGAG 

5 0851 TGTGTGACGG TGGAGGAGTT 

50901 GGCGTCGGGT CTGCGGGTTC 

5 0951 GGCGGCGCGG TGTTCAGATC 

510 01 GGCGATGACT CGTGGATCTG 

51051 CGCTCGTCTG GACACGGAGT 

51101 AACCGCTGGA TGTCGACGGC 

51151 GGATACGGTC CGGCGTTCCG 

512 01 GGACCTGCTG GCCGAGGTCG 
51251 GCTACGGGAT CCACCCCGCC 

513 01 GCCGCCCGCT TCATGGACGG 

513 51 CGGGTGGGCC GGAGTGTCTC 

514 01 TGCGCCTCCG TCCGGTCGGG 
514 51 GTCACCGATG CGACCGGCGG 
51501 CCGCCCCGTG AAGCCGAGCC 
51551 GCGGTCTGTT CACTGTGGAG 
516 01 GGGGAGGCCG ACTGGGTTGT 
51651 TGTGGTGTCG GCGGCGGGTG 
51701 TCGATGCGTC TGTGGGCGAC 
51751 GTGGAGCGGG TGCTGTCACT 
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GGACGAGTCG GACAGCCAGG AGTCGGTGCT 
AGCGTCCTGA GGAGTCGGCG TTTGTGGAGG 
GCTGGTGTTG CTGTGGACTG GTCGGTGTTG 
TGGGCTGGTG GAGTTGCCGA CGTATGCGTT 
TGAGTGGCCG TTCTGGGGGT GGGGATGCGG 
GCGGGGCATC CGTTGTTGGG TGCGGCGGTG 
GTGTCTGCTG ACCGGTCGTC TGTCGCGGTC 
ATCATGTGGT GGCGGGTGCG GTTTTGGTGC 
TGGGCGTTGC GGGCCGGTGA TG AG GTCGGT 
GATGTTGCAG GCGCCTTTGG TGGTGCCTGA 
AGGTGGTGGT TGAGGAGGCG GGTGAGGACG 
TACAGCCGGC CCGACGCGGA CGCCGTGGGC 
CCACGCGACC GGCGTACTGT CACCCGAAAG 
TGGGTGGCGT CTGGCCACCG GCCGGTGCCG 
TTCTACGCGC AGGCCGGTGA GGC CGGGTAC 
GGGGCTGCGT GCCGTGTGGC GGCACGGCCA 
TCCTGCCCGA AGCCGCCGGT GCCCATGACG 
CTCCTCGACG CCACCCTCCA TCCGCTGCTC 
TTCCGAGGAC GATCAGCTCT ACGTACCGTT 
TGCGGGCGGT GGGAGCCACG ACTGTGCGCG 
GAGAG CGTCG ACCAAGGGCT GAGCGTGACG 
TCCCGTTCTG AGCGTCGACT CCCTCCAGAC 
AATTGGCTGC GGCCCAACAG CCGGACGTAC 
TGGACGCCGC TGCCGCAGAC GGATGCCGAC 
GCTCTCGGAC GGTGTTGGCC GTCTGGCTGA 
GTGAAGCGCC GTGGGCAGTG GTCGCTCCTG 
GGCCGTGAGG GTCTTGACGG TCGGCTGGTC 
CGTACAGGAG TTCCTGGCCC TGCCGGAGCT 



518 01 GGCCGAGTCC CGTCTCCTCG 

518 51 TCGACGGTGA CGGTGACGTG 

51901 GTCCGCAGTG CTCAGTCCGA 

51951 GGACGGCGAC GGCGACGACC 

52001 CCCACGCCAC CCTGCGTCAC 

52 051 GCCCTGCGGG AAGGGACGCT 

52101 GTCCGCCGAA CTCGTCGTGC 

52151 TGGTGCACGA CGGCTCGCTG 

522 01 GCCCTGGAGC CCTTGGCGCC 

52251 GGGCATCAAC TTCCGTGACG 

52 3 01 ACGGGGCCAT GGGTGGCGAA 

52351 GAGGTCACCC ATGTCTCGGT 

524 01 CGCGTTCGGC C CTGTGGTGA 

52 4 51 CGCAGGGCTG GGACATGCGG 

52501 ACGGCTTGGT ACGGGTTGGT 

52551 GGTGCTGGTC CATGCCGCGA 

52 6 01 TCGCCCGGCA TGTGGGTGCC 

526 51 CACGCCGTGC TGGAGGAGAT 

52701 CCGGGACCTC GCCTTCGAGG 

52 751 GCATGGACGT CGTGCTCAAC 

52 8 01 CTGCGGTTGC TCGGCGACGG 

528 51 TGTGCGGGCC GCCGAAGAGG 

52 901 CGGCGTACGA CCTCGTCGGT 

52 951 CTGGACAAGC TCGTCGAATT 

53 0 01 GGTACGTTCC TGGCCGCTGG 
53 051 GTCAGGCGAA GCACACCGGC 
53101 GACCCCGAGG GCACGGTTCT 
53151 GGTCGTGGCC GAG CATCTGG 



TGGTGACGCG CGGTGCGGTG GCCACCGGCG 
GACGCGTCCG CCGCAGCTGT ATGGGGCCTG 
GAATC CGGGC CGCTTCATCC TGCTCGACGT 
AGGGCCCGGA CCTGAACGGC CGGCATCTGC 
GCCGCCGAGG AACTCGACGA GCCCCAACTC 
CTACGTCCCC CGACTGACCC AGGCGCGCCA 
CGCCCGGTGA ACCGGCGTGG CGCCTGCGGA 
GACGCCCTGG CGGCAGTGGC CTGCCCGGAG 
GGGGCAGGTG CGTATCGCCG TACACGCCGC 
TACTGGTGGC CTTGGGTATG GTCCCCGCGT 
GGTGCCGGTG TCGTGACGGA GGTCGGTCCC 
GGGCGACCGC GTGATGGGCG TGTTCGAGGG 
TCGCCGAGGC GCGGATGGTC ACACCTGTCC 
GAGGCGGCCG GTATTCCGGC GGCCTTCCTG 
GGAGCTGGCC GGTCTGAAGG CGGGCGAGCG 
CGGGTGGTGT GGGGATGGCG GCGGTGCAGA 
GAGGTGTTCG CCACCGCGAG TCCGGGCAAG 
GGGCATCGAC GCCGCCCACC GCGCCTCCTC 
GCACGTTCAG GGAAGCAACG GGCGGCCGCG 
AGCCTTGCCG GCGAGTTCAT CGACGCCTCT 
CGGCCGGTTC CTGGAGATGG GCAAG AC CG A 
TGGCTGCGGA GCACGCGGAC GTCTCGTACA 
GATGCCGGAC CCGACCGCAT CAGCAACATG 
GTTCGCCTCA GAACGGCTTA AGCCGCTGCC 
ACAAGGCGCA GGAGGCGTTC CGGTTCATGA 
AAGCTGGTGC TTGAGATCCC GCCTGCCCTC 
GGTCACCGGG GGCACCGGTG CGCTGGGGCA 
TCCGGGAGTG GGGCGTACGG CACCTGCTGC 



53201 
53251 
53301 
53351 
53401 
53451 
53501 
53551 
53601 
53651 
53701 
53751 
53801 
53851 
53901 
53951 
54001 
54051 
54101 
54151 
54201 
54251 
54301 
54351 
54401 
54451 
54501 
54551 



TGGCCAGCCG 
AAGCTCACCG 
CGACCCGGCC 
CGTTGACGGG 
ACCGCTCAGA 
TGCGGCGGCG 
TCGTGGTGTT 
AACTACGCGG 
GGCGGTGGGC 
CGGACGCCAA 
GTCGGCAAGG 
GGACACGACC 
CCGACGTAGC 
CACGGTCTCG 
GGTCGGCTTC 
CCCGGCCCGC 
AGCAGGCCGA 
GCTGGCCGCG 
TCAGGGAACA 
ACGGGCAGCA 
ACTGCGCAAC 
TGGTCTTCGA 
CAACTCGCCC 
GGTCCTCCGC 
TCGAGCACCT 
TCGAACTGGA 
GCAGCTCCAG 
AACTGGGTGT 



TCGCGGTCCG 
GGTTGGGTGC 
TCGGTGGTGG 
TGTCGTGCAC 
CGCCTGAGGG 
AATCTCCATG 
CTCCTCGGCG 
CCGCCAATGC 
CAGGTCGGCC 
GCCGGGTGTT 
CGAGTGCTCT 
GGCACCGCCC 
CCGCATGGCA 
CCCTGTTCGA 
AACCTCGACC 
CCTTCTGCGC 
CCGCGACGGC 
CTGTCGCCGT 
GGCCGCCACC 
CCTTCAAGGA 
AGGCTGTCCG 
CCACCCGGAC 
CCGACGGGGA 
GACCTGGCGA 
CGACGCCGAC 
AGGCGGCGAG 
GTTGCCACGA 
GTGAAACGAC 
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GAGGCGCCGG 
CGAGGTCACC 
AGCTGGTCGG 
GCGGCGGGCG 
GCTGGCGCGG 
AGGCGACCCG 
GCCGCCACGC 
CTATTGCGAC 
TGTCGGTCGG 
GCCGCCGACG 
ATCCGACGGC 
CCCAGGGCAT 
CGTATCGGCG 
CGCCGCGCAC 
TGCGCACCCT 
GGCCTGGCCA 
GGGCGGACAG 
CGGACCGGCA 
GTGCTCGGGC 
ACTCGGATTC 
CCGCCACCGG 
GCCGACATCC 
CACCCCCGCC 
AACTCGAGAA 
GCGGTCACGG 
CGCGGCGCCC 
C CGAC CAGGT 
CGTGCACGGC 



GCAGCGACGA 
ATTGTCGCGG 
CAAGACGGAT 
TGTTGGAGGA 
GTGTGGGCGG 
GGAGATGCGT 
TCGGCAGTCC 
GCG CTGATGC 
CTGGGGTCTC 
CCAAGGCGAG 
ACGAACGGCA 
GACCGGCGGA 
TCAAGGGCAT 
CGCCACGGCC 
GGCCACGCAC 
CCCCCACCGC 
CCCGCCGACC 
CCACACGCTG 
ACCACCCGGA 
GACTCCCTGA 
TCTCCGGCTC 
TGGCCGAACA 
GGTGCGGAAG 
CGCCCTCTCC 
CCCGACTGGA 
GGCTCGGGCA 
CCTCGACTTC 
GCGACAACCA 



ACTGGCCTCG 
CCGATGTCAG 
CCCTCGCATC 
CGGTGTCGTG 
CCAAGGCTGC 
CTCGGCCTGT 
GGG CCAGGCC 
AGCACCGACG 
TGGGAGGCGC 
TGCTGCCACC 
GCGCTCCCCA 
CTCACCGACA 
GAGCAACGCC 
GCCCCCACCT 
CCCCTGCACA 
CGGCGGGGCG 
TGGCGGGCCG 
GTCCGGCTCA 
CAGTCTCACC 
CCGCGGTCGA 
CCCGCCGGCC 
CCTCGGCGCG 
CCACCGACCC 
TCCACCCTCG 
AGCACTCCTG 
GCACGAAGGA 
ATCGACAAAG 
CGCTGAAGGC 



54 6 01 TGGGTGAACT CTCATGGCGA 

546 51 GGGTCGCCGC CGAACTGCAC 

547 01 GACCGGCGGC AGGAGCCGGT 

547 51 CGGCGGCATC GAGACGCCCG 

548 01 ACGACGCCAT TGAGCCCTTC 
548 51 ATCTACCACC CGGACCCCGA 
54 901 CGGGTTCCTA GCCGCCCCTG 

54 951 GCCCGCGCGA GGCCCTGGCC 

55 001 ACGTCCTGGG AGGCCCTCGA 
55051 GGGCAGCCCC ACCGGCGTCT 
55101 CGCAGGGCGA CCCCGGCGGC 
55151 CCCAGCGTCC TCTCGGGCCG 

552 01 GGCGGTGACC GTCGAGACAG 
55251 TGGCGGCCAA CGCCCTGCGC 

553 01 GGCGTCACCG TCATGTCCAC 
55351 GCGGGGACTG GCCCCCGACG 
55401 ACGGCACGGG CTGGGGCGAG 

554 51 TCCGACGCCC GCAGGAAGGG 
55501 GGCGATCAAC CAGGACGGCG 
55551 CCTCGCAGCG CCGCGTCATC 
55601 ACGTCGGAGA TCGACGTCGT 
55651 CGACCCCATC GAGGCCGAGG 
55701 AGGACGACCG TCCCCTGTGG 
55751 ACGCAGGCCG CCGCGGGCGT 
558 01 ACAGCGCGAA CTGCTTCCCG 
55851 ACGTCCAGTG GGAGGGCGGC 
55901 TGGTCGCGCG GCGAACGCCC 
55951 ATCGGGCACG AACGCGCACG 
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GTGAAGAGGA ACTGGTCGAC TACCTCAAGC 
GACACCCGGC AGCGCCTGCG CGAGGTCGAG 
GGCCGTCGTC GGCATGGCCT GCCGTTTCCC 
AGGGACTGTG GGAGCTGGTC GCGGCCGGCG 
CCCACCGACC GGGGCTGGGA CCTGGAAGGC 
CCACCCGGGT ACCTGCTACG TCCGGGAGGG 
ACCGGTTCGA CTCCGACTTC TTCGGCTTCA 
AGCAGCCCGC AACTGCGACT GCTCCTGGAG 
ACGGGCGGGC ATCAACCCCG CCTCGCTCAA 
ACGTCGGCGC CGCGACCACC GGCAACCAGA 
AAGGCGACCG AGGGTTACGC GGGCACCGCG 
CCTCTCGTTC ACGCTCGGCC TGGAGGGCCC 
CGTGCTCCTC CTCGCTGGTG GCGATGCACC 
CAGGGCGAGT GCGACCTCGC CCTCGCGGGC 
CCCCGAGGTG TTCACAGGCT TCTCGCGTCA 
GCCGCTGCAA GCCGTTCGCC GCCGCGGCCG 
GGCGCGGGCC TGATCCTCCT GGAGCGCCTC 
CCACAAGGTC CTCGCGGTGA TCCGGGGCTC 
CGAGCAACGG CTTCACCGCG CCCAACGGCC 
CGCCAGGCAC TCTCCAGCGC CCACCTCTCC 
CGAGGCGCAC GGCACCGGCA CCAGGCTCGG 
CGCTCATCGC CACCTACGGC AAGGAGCGCG 
CTCGGCTCGG TCAAGTC CAA CATCGGCCAC 
CGCCGGAGTC ATCAAGATGG TGATGGCGCT 
CCACCCTGAA CGTCGACGAG CCGACCCCGC 
GGCGTACGCC TCCTGACCGA ACCGGTCCCG 
GCGCCGCGCC GGAATCTCCT CCTTCGGCAT 
TGGTCCTGGA GGAGGCGCCG CCGGAGGAGG 



56001 
56051 
56101 
56151 
56201 
56251 
56301 
56351 
56401 
56451 
56501 
56551 
56601 
56651 
56701 
56751 
56801 
56851 
56901 
56951 
57001 
57051 
57101 
57151 
57201 
57251 
57301 
57351 



ACGTGCCGGG 
GTCTCCGCGC 
CGAGTTCGTG 
TG AC CACGAG 
GATCGGGATG 
GTCGGCGGAT 
TGTTGGTGTT 
CTCCTTGACG 
GGCGCTGTCG 
ATGGGAGTGA 
GTGATGGTCT 
CGCTGTGATC 
GGGCGCTGTC 
GCGCTTCGTC 
CTCGGAGCAG 
CAGCGGTCAA 
GTGGCAGCCG 
CATCGATGTC 
ACCTCCTCAC 
GCCTTCTATT 
GGATACGGAT 
ACACCATCGA 
AGCGCGCACC 
GGACATCCCC 
CCACCCAGCT 
GTCGACTGGC 
CCTGCCCACC 
GTGTCGGAGA 



CCCCGTGGCT 
GGACCGAGGA 
GCCGACACGG 
CAGGGCGATC 
CGCTGACGGC 
GTGGTGGCTG 
TCCGGGGCAG 
AGTCGCCCGT 
GCGTACGTGG 
ACTGTCCCGG 
CGCTGGCTGC 
GGGCACTCGC 
TTTGGAGGAT 
AGCTGATGGG 
GCGGCTGAGC 
CGGGCCGTCC 
TGGTCGCGGA 
GGCTATGCCT 
CGACCGGCTC 
CGACGGTCAC 
TACTGGGTTA 
TGCGCTTCTC 
CGGTGCTGGG 
GCCACGGTCG 
CACCCGTGCC 
GGCGCTGGTT 
TACGCCTTCC 
TGTGCGGTCG 
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GCGGAGCCGG 
GGCGTTGAGC 
ACCCGTCGAC 
CTTGAACACC 
CGGCCTGGCG 
GGGTGGCCGG 
GGGTCGCAGT 
CTTCGCGGCG 
ACTGGTCGCT 
GTCGAGGTCG 
CGTCTGGGCG 
AGGGCGAGAT 
GCGGCGCGCG 
GCAGGGCGAC 
TCATCGGTGA 
TCGACAGTCA 
TGCGGAGGAA 
CGCACGGTCC 
GCCGACATCC 
CGCCGAGCGC 
CCAACCTCCG 
GCGGACGGCT 
TCTGGGCATG 
TCCCCACCCT 
GCAGCGCACG 
CCCGGCCGAC 
AGCGCCGCAG 
GCCGGGCTGC 



AAGGGGTGGT 
GAACAGGCGC 
CGCTGACGTC 
GCGCTGTGGT 
GCGTTGGCCG 
TGATGTGGGT 
GGGTGGGCAT 
CGGATCGCGG 
GAGTGCGGTG 
TGCAGCCGGT 
GATTACGGGG 
GGCCGCCGCG 
TCGTGGCCGT 
ATGGCGTCGT 
TCGGCCGGGC 
TTTCAGGACC 
CGTGGTCTGC 
C C AGATCG AT 
GGCCCGCGAC 
CTGACGGACA 
CCAGCCGGTC 
ATCGCCTGTT 
GAGGAGACCA 
GCGCCGCGAT 
CCTTCACCGC 
CCCACCCCCC 
CTACTGGTTG 
GGCGGGTGGA 



GCCGTGGGTG 
GGCGCCTGGG 
GGGTGGTCAC 
GGTGGGGCGT 
CGGGTGAGGA 
CCTGGGCCGG 
GGGCGCCCAG 
AGTGTGAGCA 
TTGCGCGGGG 
GTTGTGGGCG 
TCACCCCGGC 
TGCGTGGCGG 
ACGCAGTGAC 
TGGGCGCCAG 
GTATGCATCG 
GCCGGAGCAT 
GCGCCCGTGT 
CAGCTCCACG 
CACGGACGTG 
CCACGGCCCT 
CGTTTCGCCG 
CATCGAGGCC 
TCGAGCAGGC 
CACGGTGACA 
CGGCGCCACC 
GCACGATCGA 
CCGGTGGACG 
ACACTCGCTG 
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574 01 TTGCCCGCGG CGCTCGGTCT CGCCGATGGT GCGCTCGTGC TGACCGGACG 

574 51 GCTCGCGGCG TCCGGTGGTG GTGGCGGTTG GCTCGCGGAT CACGCGGTGG 

575 01 CGGGCACGAC GCTCGTCCCC GGTGCCGCGC TGGTCGAGTG GGCGTTGCGG 
57551 GCCGCCGACG AGGCGGGCTG CCCCTCCCTT GAGGAGCTGA CGCTCCAGGC 
57601 ACCTCTGGTG CTGCCCGGCT CCGGGGGCCT CCAGGTCCAA GTGGTCGTGG 
57651 GTCCGGCCGA CGGACAGGGC GGCCGGCGTG AGGTGCGCGT CTTCTCGCGT 
57701 GTCGACTCGG ACGACGAGGC AGCGGGGCAG GACGAGGGGT GGTCGTGTCA 
57751 CGCGACCGGT GTGCTGAGCC CCGAGCCCGG TGCGGTACCG GACGGGCTCA 
57801 GCGGACAGTG GCCCCCGACG GGCGCCGAGC CGCTGGAGAT CAGTGATCTC 
57851 TACGAGCAGG CGGCATCGGC GGGATACGAG TACGGGCCGT CGTTCCGGGG 
57901 CCTGCGCTCC GTGTGGCGGC ACGGGCATAA CCTGCTGGCA GAGGTGGAGC 
57951 TGCCCGAACA GGCAGGTGCG CACGACGACT TCGGCATCCA CCCCGTACTG 
58001 CTGGACGCCG CGCTGCACCC GGCGCTGCTG CTCGAC CAGA ACGCGCCCGG 
58 051 CGAAGAGCAA GAGCCAGCCC AGCCCGCTCT TCGCCTGCCG TTCGTGTGGA 
58101 ACGGCGTCTC CCTGTGGGCC ACCGGCGCCG CGACCGTGCG GGTACGGCTG 
58151 GCCCCGCACG GGGGAGGGGA GACGGACGAT AGCGCCGGGC TGCGCGTGAC 

582 01 GGTCGCCGAC GCCACCGGAG CACCGGTGCT GAGCGTGGAC TCCCTCGCTC 
58251 TGCGCCCCGC TGACCCCGAA CTGCTGCGCA CGGCCGGTCG GGCGGGCAGC 

583 01 GGCACCAACG GCTTGTTCAC GGTGGAGTGG ACCGCTCTGC CCCCGGCGGA 

583 51 CGTGGCCGAC CACGCCGCAG GCGACGGCTG GGCGGTGCTC GGTCAGGACG 

584 01 TACCCGACTG GGCCGGAGCG GACATGCCCC GGCATCCCGA CATGGCCTCC 
584 51 CTGTCGGCCG CGCTGGACGA GGGAACG C AG GCCCCTGCGG CCGTCTTCGT 
58 501 GGAGACCACA GCCACATCGC ACGCCACACC GAACACCGCA GCGGACGTGA 
58 551 CGCTCGACGC GTCCGGCCGG GCGGTCGCCG AGCGCACCCT GCACCTGCTG 
58601 CGGGACTGGC TCGCCGAACC GCGCCTCGCC GAGACCCGGC TCGTC CTC AT 
58651 CACCCACCAC GCGGTGACGA CCCCGGCGGA CGACGACGTG AACGCCGCAC 
58701 CCCTCGACGT CCCGGCCGCC GCCCTGTGGG GACTGATCCG CAGCGCACAG 
58 751 GCCGAACACC CGGACCGCTT CGTTCTGTTG GACACCGACG CGAAGGCCAA 
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58801 CACCGACCCC GGCCCCGACA CCAGTACTGA CCACAGCACC GCATCGGGTA 

58 8 51 CGTACCGAAC CGTCATCGCG CGGGCCCTCG CCACCGGGGA GCCACAGCTG 

58 901 GCCGTGCGCG CGGGAGAACT GCTGGCTCCC CGCCTCGCCC GAGCCGCCAC 

58 951 CCCCACACCC GAGACCCCCA CACCCGAGAC ACAGCCCGAC ACCGGATCCG 

59001 GGTCCGAGGC CGGGGCCGGG TCCGGATCTG GACCCGGCGC GACACTGGAC 

5 9051 CCCGACGGCA CCGTCCTCAT CGCGGGCGGC ACCGGCATGA TGGGTGGTCT 

59101 CGTCGCCGAA CACCTGGTCC GCGCCTGGTC GGTGCGGCAC CTCCTGCTCG 

5 9151 TCAGCCGGCA AGGGCCCGAC GCGCCGGACG CCCGCGACCT CGCCGACCGG 

5 9201 CTGGTCGGCC TGGGCGCGAC GGTACGGATC GTCGCGGCCG ACCTGACGGA 

5 9251 CGGGCGGGCC ACCGCGGACC TCGTCGCGTC GGTCGACCCG GCGCACCCGC 

5 9301 TCACCGGTGT GATCCACGCG GCCGGCGTCC TGGACGACGC CGTGGTCACC 

59351 GCGCAGACCT CCGACCAGCT GGCCAGGGTG TGGGCGGCCA AGGCGTCCGT 

5 9401 CGCCGCCAAC CTGGACGCGG CCACGTCGGA GCTGCCGCTC GGCTTGTTCC 

5 9451 TGATGTTCTC GTCCGCCGCC GGTGTCCTCG GCAACGCGGG CCAGGCCGGT 

5 95 01 TACGCGGCCG CCAACGCCTT CGTCGACGCC CTGGTCGGCC GCCGTCGCGC 

59551 CACCGGCCTG CCCGGCCTGT CGATCGCCTG GGG CCTGTGG GCGCGCGGCA 

59601 GCGCCATGAC CCGGCACCTG GACGACGCCG ACCTCGCGCG GCTGCGTGCC 

5 9651 GGCGGGGTCA AGCCCCTGCT GGACGAGCAG GGCCTCGCCC TCCTCGACGC 

5 9701 GGCGCGCGCC ACCGCCGCGC ACACCTCGCT GGTGGTCGCG GCCGGTATCG 

5 9751 ACGTACGCGG ACTGAACAGG GACGACGTCC CCGCGATCCT CCGCGACCTG 

5 98 01 GCGGGCCGGA CCCGCCGCAG GGCGGCCGCC GACTCCACCG TCGACCAGGC 
598 51 CGCGCTGGAG CGGCGCCTCA CGGGCCTGGA CGAGGCCGAG CGCCGGGCTG 
59901 TCGTCACCGA CGTCGTACGC GAATGCGTGG CGGCCGTGCT CGGCCACCGG 
59951 TCGGCGGCCG ACGTACGCAC CGAGGC CAAC TTCAAGGACC TCGGCTTCGA 

6 0001 CTCGCTCACT GCGGTGCAGC TGCGCAACCG CCTCTCGGCG GCGAGCGGCC 
6 0051 TCCGCCTGCC CGCCACCCTG GCCTTCGACC ACCCCACCCC CCAGGCGCTG 
60101 GCGGCGTACC TCGGCACGCG CCTGAGCGGC CGGACCGCCA CCCCCGTCGC 
60151 ACCCGTGGCG CCTTCCGCGG CCGCGACGGA CGAGCCGGTG GCGATCGTCG 



6 0201 CGATGGCCTG CAAGTACCCG 

6 0251 GACCTGGTCG CGGAGGGCGT 

6 0301 CGGCTGGGAC CTCGAACGGC 

6 0351 CGAGTTACGC CGACGAAGGG 

6 0401 GCGGCGTTCT TCGGGATCAA 

6 0451 GCAGCGGCTG TTGCTGGAGG 

6 05 01 TCGACCCGAC GACGCTCAAG 

6 0551 ATGTACCACG ACTACGCGGC 

6 0601 CTACTCCATG CTCGCCGGCT 

60651 ACACCCTGGG GCTTGAGGGT 

6 0701 TCGTCCCTGG TCTCCATCCA 

607 51 GTGCACTCTC GCCCTCGCGG 

6 0801 TGTTCACCGG ATTCTCGCGC 

6 0851 AAGCCGTTCG CCGCCGCCGC 

6 0 901 TGTGTTGTTG CTCGAGCGGT 

6 0951 TGTTGGGTGT GGTGCGGGGT 

61001 GGGTTGACGG CGCCGAATGG 

61051 GTTGGCCAGT GGTGGGTTGT 

61101 ATGGGACGGG GACCACGTTG 

61151 GCCACGTATG GGCAGGGGCG 

612 01 GGTGAAGTCG AATATTGGTC 

612 51 T CAT C AAG AT GGTGATGGCG 

613 01 CATGTGGATG TGCCGTCGCC 

613 51 GTTGGCGGTT GAGTCGGTGC 

614 01 GGGCGGGTGT GTCGTCGTTC 

614 51 GTGGAGTCTG TGCCCGATGG 

615 01 TGAGGCTCCC GAGACTGAGA 
615 51 CGGCCCGCAG CCCGCAGGCC 
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GGTGGAGCGA CCTCGCCGGA AGGCCTCTGG 
GGACGCGGTC GGCGCCTTCC CGACGGGCCG 
TCTTCCACCC CGACCCGGAC CACCCCGGCA 
GCCTTCCTTC CTGACGCGGG CGATTTCGAT 
TCCGCGGGAG GCGCTGGCGA TGGATCCGCA 
CGTCGTGGGA GGTGTTGGAG CGTGCGGGTA 
GGCACCCCGA CCGGCACGTA CGTCGGCGTG 
AGGCCTCGCC CAGGACGCCC AACTGGAGGG 
CCGGCAGCGT GGTGTCCGGC CGCGTCGCCT 
CCTGCGGTGA CGGTCGACAC CGCGTGCTCC 
CCTGGCCGCG CAAGCACTGC GACAGGGCGA 
GCGGCGTGAC CGTCATGGCC ACGCCCGAGG 
CAGCGCGGCC TGGCCCCCGA CGGCCGCTGC 
CGACGGCACC GGCTGGGGCG AGGGTpTCGG 
TGTCGGATGC GCGGCGTCAT GGGCGTCGGG 
TCGGCGGTGA ATCAGGACGG TGCGAGTAAT 
TCCGTCGCAG GAGCGGGTGA TTCGTCAGGC 
CGTCGGTGGA TGTTGATGTG GTGGAGGGGC 
GGTGATCCGA TCGAGGCG C A GGCTCTGCTG 
TCCGGTGGAT CGTCCGTTGT GGTTGGGGTC 
ATACGCAGGC GGCTGCGGGT GTTGCGGGTG 
ATGCGGCATG GTGTGGTGCC GGCGAGTTTG 
GCATGTGGAG TGGGATTCGG GTGCGGTGCG 
CATGGCCGGA GGTGGAGGGT CGTCCGCGTC 
GGGGCTTCGG GAACGAATGC GCACGTGATC 
GCTGGGGGAG GACTCGGTAT CGGTCAGTGG 
CTGACGGGCG CTTGGTGCCG TGGGTGGTAT 
CTGCGCGACC AGGCACTACG CCTGCGTGAT 
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61601 
61651 
61701 
61751 
61801 
61851 
61901 
61951 
62001 
62051 
62101 
62151 
62201 
62251 
62301 
62351 
62401 
62451 
62501 
62551 
62601 
62651 
62701 
62751 
62801 
62851 
62901 
62951 



GCGGTGGCGG 
GCTGAAGACG 
AGAGGGCTGA 
CACCCGGCTG 
TGTGGTGTGG 
CTGGTTTGTA 
TGCGGCCTGT 
GGTGGTGTTC 
AGGCGGGGTT 
GTCGGGGTGC 
GGCCGCGCAT 
TGGGGGCGCG 
TGCGCGGTGC 
TGGTGTGAGT 
GGCCGTCTGG 
CGTAAGACGA 
GGAGCCGATG 
CGCGGCCGAA 
GAGGAGATCG 
GCTCTTCCAG 
TCGAGCTCGG 
GACGACGTAA 
GGCCGGTGAG 
TGCATACCGC 
CGTGTGCCTG 
GTTCTGGTTG 
TGGTGGCGGC 
CGGGGTGGGT 



CCGACTCAAC 
CGTGCGCTGT 
ACTCCTGTCG 
TGACGCGGTC 
CTGTTCAGTG 
TGAGCGGTTC 
TGGAGGGGCC 
CGTGGCCCGA 
GTTTGCGCTG 
GGCCGGATGT 
GTGGCGGGGG 
GGCCCGTTTG 
AGGCCACGCC 
GTGGCGGCGG 
TGAGGTGGAT 
AGGCGCTGAG 
CTCGCGGAGT 
GGTGTCGTTG 
CGTCCCCGGA 
CCCGGCATCG 
CCCCGGCCCC 
C CGATAGGCA 
CGTCCTGAGG 
TGGTGTTGCT 
GGCTGGTGGA 
AGCGGCCGTT 
GGGGCATCCG 
GTCTGCTGAC 



GGTGTCGGTG 
TCGAGCAGCG 
GGGCTTGCTG 
CCGTGAGGAC 
GTCAGGG CAG 
CCGGTGTTTG 
GTTGGG CGTG 
GGGAGCGGTT 
CAGGTGGGGT 
GGTGCTCGGG 
TCTTTGATCT 
ATGGGTGGGC 
CGCCGAGCTG 
TCAACACACC 
CGGATTGCTG 
CGTCAGTCAT 
TCACCGAAGC 
ATC AG CAACG 
GTACTGGGCA 
CCCAAGTGGC 
GTACTGACTA 
TGGCCCCGAA 
AGTCGGCGTT 
GTGGACTGGT 
GTTGCCGACG 
CTGGGGGTGG 
TTGTTGGGTG 
CGGTCGGCTG 



CAGGATGTGG 
GGCGGTGGTG 
TGTTGGCCGC 
GGGGTTGCTG 
TCAGTTGGTC 
CGGCTGCGTT 
GAGGCGGGTG 
GGATCACACG 
TGGCCCGGTT 
CATTCGATCG 
GGCGGATGCC 
TGCCTGAGGG 
GCCGCCGACG 
TGATTCGACG 
GGGTGTGGCG 
GCCTTCCACT 
GATACGAGAG 
TCTCTGGTCT 
CGCCATGTAC 
TTCCACGGCA 
CTGCCGCCCA 
CCGGTACTGG 
CGTGGAGGCG 
CGGTGTTGTT 
TATGCGTTCC 
GGATGCGGCG 
CGGCGGTGGA 
TCGCGGTCTG 



GCTGGTCGCT 
GTGGGGCGTG 
TGGCGAGGAG 
CGAGCGGTGC 
GGTATGGGTG 
TGATGAGGTG 
GGTTGCGGGA 
ATGTGGGCGC 
GTGGGAGTCG 
GTGAGATCGC 
TGTCGGGTGG 
CGGGGCGATG 
TGGACGACTC 
GTGATTTCAG 
GG AG CGTGGG 
CGGCGTTGAT 
GTCAAGTTCA 
GGAGGCGGGT 
GCCAGACAGT 
GGCGTGTTTG 
GCACACCCTG 
TGTCCTCGCT 
ATGGCTCGTC 
CGCGGGTGAT 
AGCGGGAGCG 
ACTTTGGGTC 
GTTCGCGGAC 
GGGTGTCGTG 



63 001 GCTTGCTGAT CATGTGGTGG 

63051 TGGTGGAGTG GGCGTTGCGG 

63101 GAGGAGTTGA TGTTGCAGGC 

63151 GCGGGTTCAG GTGGTGGTCG 

63201 TCCAGATCTA TAGCCGGCCT 

63251 TGGATCTGCC ACGCGACCGG 

63301 GAACGACGGA CTGGCCGGCG 

63 351 ACCTGGCGGG CTTCTACGAG 

6 3401 CCGGGGTTCC AGGGGCTGCG 

6 3451 GGCCGAGGTC GTCCTGCCCG 

63 501 TCCACCCCGC CCTCCTCGAC 

6 3551 TGGCCCGGGG AGGTGCAGGA 

63 601 CTGGAACCAG GTCTCCTTGC 

6 3 651 GTCTCTCGCC CGGCGAGCAC 

63701 GTGGCCGACG CCACCGGGAC 

6 3751 GCGTCCCGCC GACATCCGGC 

63801 GTCTGTTCTC GGTGGACTGG 

63851 TCGCAGACGG ATGCCGACGG 

63901 TGTCGGCAGC CTGGCTGATG 

63 951 GGGCAGTGGT CGCTCCCGTC 

64 001 TTTGACCGCC GTGAGGGTCT 
64 051 GTCACTCGTA CAGGAGTTCC 
64101 TCCTCGTGCT GACCCGCGGC 
64151 GATGTGGACG CGTCCGCCGC 
64201 GTCCGAGAAC CCGGGCCGCT 
64251 ACGTCGACGT GGACATGGAC 
643 01 GTGGACGGAG ACGGCAATGG 
64 351 ACGACTTCCC CACGCCACCC 
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CGGGTGCGGT TTTGGTGCCG GGTGCTGCGT 
GCCGGTGATG AGGTCGGTTG TGTGACGGTG 
GCCTTTGGTG GTGCCTGAGG CGTCGGGTCT 
AGGAGGCGGG TGAGGACGGG CGGCGCGGTG 
GACGCGGACG CCGTGAGCGG CGACGACTCG 
CACCCTCACC CCCCAGCACA CCGACGCTCC 
CGTGGCCCGC GGCGGGCGCC GTGCCGGTGG 
CGCGTGGCGG ACGCGGGCTA TGCGTACGGC 
TGCCGTGTGG CGGCACGGTC AGGACCTGCT 
AAGCCGCGGG TGCCCATGAC GGCTACGGCA 
GCCACCCTCC ACCCGGCCCT GCTCCTCGAC 
CGACGACGGG AAGGTCTGGC TGCCTTTCAC 
GGGCTGCGGG AGCCGCCACC GTACGCGTAC 
GACGAGGCGG AACGGGAAGT ACAGGTACTG 
CGACGTCCTG AGCGTGGGGT CGGTGACGTT 
AACTGCAGGC CGTGCCGGGT CACGACGACG 
ACGCCGCTGC CGCTGTCGCG GACGGATGTG 
GGATGCCGAC TGGGTTGTGC TCTCGGACGG 
TGGTGTCGGC GGCGGGTGGT GAAGCGCCGT 
GGTGCATCCG CGGGCGGCGG CCTTGCCGGC 
TGACGGTCGG CTGGTCGTGG AGCGGGTGTT 
TGGCCGCGCC GGAGCTGGCC GAGTCCCGGC 
GCCGTGGCGA CCGGCGGCGA CGGCGACGGT 
AG C CGTATGG GGCCTGGTCC GCAGTGCTCA 
TCATCCTGCT CGACGTGGAC ATGGACGTGG 
GTGGACGTCG ACGTGGACGT CGACGTGGAC 
CAGCGACCTG GACCCGGACC TGAACGGCCG 
TGCGTCACGC CGCCGAGGAA CTCGACGAGC 
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64401 


CCCAACTCGC 


CCTGCGCGAC 


GGACAACTGC 


TCGTTCCGCG 


GCTGGTCCGC 


64451 


GCCACCGGCG 


GCGGACTCGT 


CGTGGCGCCC 


ACCGACCGTG 


CCTGGCGCCT 


64501 


GGACAAGGGA 


AGCGCCGAGA 


CGCTGGAGAG 


CGTCGCGCCG 


GTCGCGTACC 


64551 


CCGGAGTCAT 


GGAACCCCTG 


GGCCCCGGCC 


AGGTCCGCCT 


CGGCATCCAC 


64601 


GCCGCGGGCA 


TCAACTTCCG 


CGACGTCCTG 


GTCAGCCTCG 


GCATGGTGCC 


64651 


CGGCCAGGTC 


GGCCTGGGCG 


GCGAAGGCGC 


CGGTGTCGTG 


ACGGAGACAG 


64701 


GCCCCGATGT 


CACCCACCTG 


TCGGTCGGCG 


ACCGCGTGAT 


GGGCGTCCTC 


64751 


CACGGCTCCT 


TCGGCCCGAC 


GGCCGTGGCG 


GACACCCGCA 


TGGTCGCGCC 


64801 


GGTTCCGCAG 


GGCTGGGACA 


TGCGGCAGGC 


GGCCGCGATG 


CCCGTCGCGT 


64851 


ATCTGACGGC 


TTGGTACGGG 


TTGGTGGAGC 


TGGCCGGTCT 


GAAGGCGGGC 


64901 


GAGCGCGTGC 


TGATCCACGC 


AGCCACGGGT 


GGTGTGGGAA 


TGGCGGCGGT 


64951 


GCAGATCGCC 


CGTCACCTGG 


GTGCCGAGGT 


GTTCGCCACC 


GCCAGTGCAG 


65001 


CCAAGCACGT 


CGTACTGGAA 


GAGATGGGCA 


TCGACGCCGC 


CCACCGCGCC 


65051 


TCCTCCCGGG 


ACCTCGCCTT 


CGAGGACACC 


TTCCGGCAGG 


CCACCGACGG 


65101 


GCGCGGCATG 


GACGTCGTCC 


TCAACAGCCT 


GACCGGCGAG 


TTCATCGACG 


65151 


CATCTCTGCG 


GTTGCTCGGC 


GACGGCGGCC 


GGTTC CTGG A 


GATGGGCAAG 


65201 


ACCGATGTGC 


GCACGCCGGA 


GGAGGTGGCC 


GCGGAGTACC 


CGGGTGTCAC 


65251 


CTAC AC CGTG 


TACGACCTCG 


TCACCGACGC 


GGGGCCGGAT 


CGCATCGCGG 


65301 


TCATGATGAG 


TGAGCTGGGC 


GAGAGGTTCG 


CTTCCGGTGC 


CCTTGACCCT 


65351 


CTGCCGGTGC 


GTTCCTGGCC 


GCTGGACAAG 


GCGCGTGAGG 


CGTTCCGGTT 


65401 


CATGAGTCAG 


GCCAAGCACA 


CCGGCAAACT 


CGTACTCGAC 


GTGCCCGCAC 


65451 


CGCTCGACCC 


CGACGGGACC 


GTCCTGATCA 


CCGGAGGCAC 


GGGGGCGCTG 


65501 


GGGCAGGTCG 


TGGCCGAGCA 


TCTGGTGCGG 


GAGTGGGGCG 


TACGGCACCT 


65551 


GCTGCTGGCC 


AGCCGCCGTG 


GACTGGACGC 


CCCCGGCAGC 


GGTGAACTCG 


65601 


C CGACAGG CT 


GTCGGACTTG 


GGCGCCGAGG 


TGACCGTCGC 


GGCGGCCGAT 


65651 


GTGAGCGACC 


CGGCCTCGGT 


GGTGGAGCTG 


GTCGGCAAGA 


CGGATCCCTC 


65701 


GCATCCGTTG 


ACGGGTGTCG 


TGCACGCGGC 


GGGCGTGCTT 


GAGGACGGGA 


65751 


TCGTGACGGC 


TCAGACGCCT 


GAGGGGCTGG 


CGCGGGTGTG 


GGCGGCCAAG 



65801 
65851 
65901 
65951 
66001 
66051 
66101 
66151 
66201 
66251 
66301 
66351 
66401 
66451 
66501 
66551 
66601 
66651 
66701 
66751 
66801 
66851 
66901 
66951 
67001 
67051 
67101 
67151 



GCCGCTGCGG 
TCTGTTCGTG 
AGGCCAACTA 
CGACGGGCGG 
GGCACCGGAC 
CCGCCGACGC 
ACCGGCACCC 
CAAGGCGATG 
GCCACGGCCG 
GCGCACAAAC 
AGACCAGGGA 
CGGCACGACC 
AAGCTCTCCG 
GGTACGGACG 
TACGCGCCGA 
GTCGAACTGC 
CACGTTCATC 
GCGCACAGCT 
GAACTGGACA 
CACCCGGACC 
ACGACACTTC 
GGCGACGCCG 
CTTCGAGCTG 
CATGCCGGGT 
ACCTGAAGCG 
GACGTGGAGG 
CCGCTACCCG 
CCTCACGCGG 



CGGCGAATCT 
GTGTTCTCCT 
CGCGGCTGCC 
CGGGCCAGGT 
GCCAAGCCGG 
CAAGACGGGA 
TGAGCGGCAC 
ACCAGCGCAC 
CCCCCACCTC 
CCGCCCCGGC 
GGCCAGGGAG 
GGCGGCCGCC 
TCCTGACAGC 
CACGCGGCAG 
CGCCGCCTTC 
GCAACCGCCT 
TTCCGGCACC 
GGCCCCCGCG 
AGCTGGAGAC 
CGCCTGGCGG 
GGCCCGCTCG 
TCGAGAACCG 
ATCGACCGAG 
ACGAACGACA 
AGTGACCGCG 
AGCGCCAGCG 
GGCGGGGTGG 
CGACGCCATC 
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CCATGAGGCG 
CGGCGGCCGC 
AATGCCTATT 
CGGCCTGTCG 
GTGTTGCCGC 
GTTGCCGCCG 
CGACGTGGCC 
ACGGTCTCGC 
GTCGCCGTCG 
CCTCCCCGCC 
GCGGCCGAGG 
ACCACCCGGC 
CGAGGAACAG 
CCGTCCTCGG 
CAGGATCTCG 
CTCCGCCTCC 
CGACCCCGTC 
GGGGCCGACC 
GGTGATCACG 
CACGCCTGCA 
GACCACGCGG 
AGACCTCGAG 
AACTGCCTTC 
TGCCGGGTAC 
GATCTCGGAC 
GGAACCGATC 
CCTCCCCCGA 
GAGGAGTTCC 



ACCCGGGAGA 
CACGCTCGGC 
GTGACGCGCT 
GTCGGCTGGG 
CGACGCCAAA 
ACGGCACTCC 
CGCATGGCAC 
CCTGCTCGAC 
ACCTCGACAC 
CTCCTGCGCG 
CGGCGGTCGG 
AGAACGTCGA 
CACCGCACCC 
GCACGCGGGC 
GCTTCGACTC 
ACCGGCCTGC 
GGCCATCGCC 
CGGCCGCGCC 
GGGCACGCGC 
GAACCTGCTG 
CCGGCGCGAG 
TCCGCGTCGG 
TTGATCAGGA 
CGAGGACAAG 
AGACCCGTCA 
GCCATCGTCG 
GCAGCTGTGG 
CCGCCGACCG 



TGCGTCTCGG 
AGTCCGGGCC 
GATGCAGCGC 
GTCTCTGGGA 
CCGGATGTTG 
CCAGGGCATG 
GCATCGGCGT 
GCCGCACACC 
CCGCGTCCTG 
CCTTCGCCGG 
GGCGGCGGCC 
CTGGGCCGCG 
TCCTCGACCT 
ACCGACGCCG 
CCTCACCGCG 
GCCTGCCCGC 
GACGAACTGC 
GCTCTTCGGT 
ACGACGAGAG 
TGGCGCCTGG 
CGACGCCGAC 
ACGACGAGCT 
GTGGAGAAGA 
CTCCGCCACT 
GCGCCTGCGC 
CGATGGCCTG 
GACCTGGTCG 
CGGCTGGGAC 
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67201 


GTGGCGGGCC 


TCTACCACCC 


CGACCCGGAC 


CACCCCGGCA 


CGACCTATGT 


67251 


ACGAGAGGCC 


GGATTCCTGC 


GGGACGCCGC 


CCGCTTCGAC 


GCCGACTTCT 


67301 


TCGGCATCAA 


CCCGCGCGAG 


GCGCTCGCCG 


CCGACCCGCA 


GCAACGGGTG 


67351 


CTCCTCGAAG 


TGTCGTGGGA 


ACTGTTCGAG 


CGGGCGGGCA 


TCGACCCCGC 


67401 


CACGCTCAAG 


GACACCCTCA 


CCGGCGTGTA 


CGCGGGGGTG 


TCCAGCCAGG 


67451 


ACCACATGTC 


CGGGAGCCGG 


GTCCCGCCGG 


AGGTCGAGGG 


CTACGCCACC 


67501 


ACGGGAACCC 


TCTCCAGCGT 


CATCTCCGGC 


CGCATCGCCT 


ACACCTTCGG 


67551 


CCTGGAGGGC 


CCGGCGGTGA 


CGCTCGACAC 


GGCGTGCTCG 


GCATCGCTGG 


67601 


TCGCGATCCA 


CCTCGCCTGC 


CAGGCCCTGC 


GCCAGGGCGA 


CTGCGGCCTG 


67651 


GCGGTGGCGG 


GAGGCGTGAC 


CGTACTGTCC 


ACGCCGACGG 


CGTTCGTGGA 


67701 


GTTCTCACGC 


CAGCGCGGAC 


TCGCACCGGA 


CGGCCGCTGC 


AAGCCGTTCG 


67751 


CCGAGGCCGC 


CGACGGCACC 


GGATTCTCCG 


AGGGCGTCGG 


CCTGATCCTC 


67801 


CTGGAACGCC 


TCTCCGACGC 


CCGCCGCAAC 


GGACATCAAG 


TACTCGGCGT 


67851 


CGTACGCGGA 


TCGGCCGTCA 


ACCAGGACGG 


CGCGAGCAAC 


GGCCTGACCG 


67901 


CCCCGAACGA 


CGTCGCCCAG 


GAACGCGTGA 


TCCGCCAGGC 


CCTGACCAAC 


67951 


GCCCGCGTCA 


CCCCGGACGC 


CGTCGACGCC 


GTGGAGGCAC 


ACGGCACCGG 


68001 


CACCACGCTC 


GGCGACCCGA 


TCGAGGGGAA 


CGCACTCCTC 


GCGACGTACG 


68051 


GAAAGGACCG 


CCCCGCCGAC 


CGGCCGCTGT 


GGCTCGGCTC 


TGTGAAGTCG 


68101 


AACATCGGCC 


ACACGCAGGC 


GGCTGCGGGC 


GTCGCAGGCG 


TCATCAAGAT 


68151 


GGTGATGGCG 


ATGCGCCACG 


GCGAGCTGCC 


CGCCTCCCTG 


CACATCGACC 


68201 


GGCCCACGCC 


CCACGTGGAC 


TGGGAGGGCG 


GGGGAGTGCG 


GTTGCTCACC 


68251 


GATCCCGTGC 


CGTGGCCACG 


GGCCGACCGC 


CCCCGCCGCG 


CGGGGGTCTC 


68301 


CTCCTTCGGC 


ATCAGCGGCA 


CCAACGCCCA 


CCTGATCGTG 


GAACAGGCCC 


68351 


CCGCCCCGCC 


CGACACGGCC 


GACGACGCCC 


CGGAAGGCGC 


CGCAACCCCC 


68401 


GGCGCTTCCG 


ACGGCCTCGT 


GGTGCCGTGG 


GTGGTGTCGG 


CCCGTAGTCC 


68451 


GCAGGCCCTG 


CGTGATCAGG 


CCCTGCGTCT 


GCGCGACTTT 


GCCGGTGACG 


68501 


CGTCCCGAGC 


GCCGCTCACC 


GACGTGGGCT 


GGTCTTTGCT 


GCGGTCGCGT 


68551 


GCGCTGTTCG 


AGCAGCGGGC 


GGTGGTGGCG 


GGGCGTGAGA 


GGGCTGAACT 
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6 8601 GCTGGCGGGG CTGGCTGCGT TGGCCGCTGG TGAGGAGCAC CCGGCTGTGA 
68 651 CGCGGTCCCG TGAGGAAGCG GCGGTTGCTG CGAGCGGTGA TGTGGTGTGG 
687 01 CTGTTCAGTG GTCAGGGCAG TCAGTTGGTC GGTATGGGTG CTGGTTTGTA 

6 8 751 TGAGCGGTTC CCGGTGTTTG CGGCTGCGTT TGATGAGGTG TGCGGCTTGC 

6 88 01 TGGAGGGGGA GCTGGGGGTT GGTTCGGGTG GGTTGCGGGA GGTGGTGTTC 

6 8851 TGGGGCCCGC GGGAGCGGTT GGATCACACG GTGTGGGCGC AGGCGGGGTT 

6 8 901 GTTTGCGTTG CAGGTGGGGT TGGCCCGGTT GTGGGAGTCG GTCGGGGTGC 

6 8 951 GGCCGGATGT GGTGCTCGGG CATTCGATCG GTGAGATCGC GGCCGCGCAT 

6 9001 GTGGCGGGGG TCTTTGATCT GGCGGATGCG TGTCGGGTGG TGGGGGCGCG 

6 9051 GGCGCGTTTG ATGGGTGGGT TGCCTGAGGG TGGGGCGATG TGTGCGGTGC 

6 9101 AGGCCACGCC CGCCGAGCTG GCCGCGGATG TGGATGGCTC GTCCGTGAGT 

6 9151 GTGGCGGCGG TCAACACACC TGACTCGACG GTGATTTCAG GTCCGTCGGG 

6 9201 TGAGGTGGAT CGGATTGCTG GGGTGTGGCG GGAGCGTGGG CGTAAGACGA 

6 9251 AGGCGCTGAG CGTGAGTCAT GCTTTCCATT CGGCGTTGAT GGAGCCGATG 

6 9301 CTCGGGGAGT TCACGGAAGC GATACGAGGG GTCAAGTTCA GGCAGCCGTC 

6 9351 GATCCCGCTC ATGAGCAATG TCTCCGGAGA GCGGGCCGGC GAGGAGATCA 

6 9401 CATCCCCGGA GTACTGGGCG AGG CATGTAC GCCAGACAGT GCTCTTCCAG 

6 9451 CCCGGCGTCG CCCAAGTGGC CGCTGAGGCA CGCGCGTTCG TCGAACTCGG 

6 9501 CCCCGGCCCC GTACTGACCG CCGCCGCCCA GCACACCCTC GACCACATCA 

6 9551 CCGAGCCGGA AGGCCCCGAG CCGGTCGTCA CCGCGTCCCT CCACCCCGAC 

69601 CGGCCGGACG ACGTGGCCTT CGCGCACGCC ATGGCCGACC TCCACGTCGC 

69651 CGGTATCAGC GTGGACTGGT CGGCGTACTT CCCTGACGAC CCCGCCCCCC 

6 9701 GCACCGTCGA CCTGCCCACC TACGCCTTCC AGGGGCGGCG CTTCTGGCTG 

6 9751 GCGGACATCG CGGCGCCCGA GGCCGTGTCC TCGACGGACG GTGAGGAGGC 

6 9801 CGGGTTCTGG GCCGCCGTCG AAGGTGCGGA CTTCCAGGCG CTCTG C G AC A 

6 9851 CCCTGCACCT CAAGGACGAC GAGCACCGCG CGGCTCTGGA GACGGTGTTC 

69 901 CCCGCGCTGT CCGCGTGGCG GCGCGAACGA CGTGAGCGGT CGATCGTCGA 

6 9 951 TGCCTGGCGG TACCGGGTCG ACTGGCGGCG CGTCGAGCTG CCGACACCCG 



70001 
70051 
70101 
70151 
70201 
70251 
70301 
70351 
70401 
70451 
70501 
70551 
70601 
70651 
70701 
70751 
70801 
70851 
70901 
70951 
71001 
71051 
71101 
71151 
71201 
71251 
71301 
71351 



TTCCGGGCGC 
CTGATCGTGG 
CCGGGCGTTG 
CGCACGCCGA 
AGCTGTGCGG 
TCTCGCCGAG 
CCAGTTGCGG 
ACGCTGCTGC 
TGCCACGCGC 
CGTCGCAGGC 
CCGGAGTTGT 
CGACGCGAGC 
AGGTCGCGCT 
GCAACCCCGG 
AGCCCATGCC 
TCGTCACCGG 
GCCGCGTCCG 
CAGCCGTGGT 
ACGGCACCGA 
TCCGTACGTC 
CGCGCCGCTG 
CCGCCGCGCG 
CTGTTCTTCT 
GTACGCCGCC 
CCGATGCTGC 
CTGCCGGACG 
GCAGGGACTC 
CGGCGCTCGA 



CGGTACCGGT 
CTCCCACGCA 
GAGGAGGCGG 
CCGGGCGGAC 
ACGACACCAC 
GCACCGGCCA 
TACCGGCTCT 
ACGGTCTGCT 
GGCGCCGTGT 
CCCGGTCTGG 
GGGGCGGCCT 
GCGTTGTATG 
GCGCCGGGGC 
ACGTGGCCCC 
GACGCGACCT 
AGGCGTCGGC 
GCGCCGAACA 
CCCGGCCGGA 
GCTGACGGTC 
CCATACGGAC 
GCCGAGGTCA 
GCTGAGCGAA 
CCTCCGTGAC 
GCCAACGCCT 
GAGCCCCCGG 
ACGGTGACGT 
CCGCCGCTGG 
CGGGGGCAAG 



CCCGACGCCG 
CGGGTCGGGT 
GCGCGCCGGT 
ATGGCGGACC 
CCAGCTCGGA 
CCAGTTCCGA 
CTCGCGTCCC 
GGATGCGGGC 
CGTGCGGCGA 
GGACTCGGAC 
GGTCGACCTG 
CGGTTCTGCG 
GCGGTCCTCG 
CGGCTCGTCC 
CCGGGGAGTG 
CACCTGGCCG 
CGTCGTACTC 
ACGACGACCT 
CTGCGGTCCC 
CGTCATCCAC 
CCCCCGACGC 
CTCCCCGGCA 
GGCTTCGCTC 
ACCTCGACGC 
ACGGTCTCGG 
GGCACGCGGC 
AACCGCAGTT 
GGGCACACGC 



ACACGGGCCT 
ACTTGGCCGC 
ACGTATCGTC 
TGGTCCAGGC 
GGAGTGCTCT 
CACCACTTCC 
ACGGCCTCAC 
GTCGAAGCGC 
CGCCGATCCG 
GCGTGGCCGC 
CCCGCCGACC 
CGGAGACGGC 
GCCGTCGCCT 
CCGGACGTGT 
GCAGCCGCAT 
ATCAGGTCGT 
CTGGACACGG 
CGCCGCGGAA 
TGAGCGAGCT 
ACATCGCTGC 
GCTCGGCGCG 
TCGGGTCAGT 
GGCAGTAGGG 
CCTGGCGCAA 
TCGGGTGGGG 
GCCGCCGGGC 
GGCGCTCGGC 
TGGTCGCCGA 



CGGGGCGTGG 
AAGCCTGTGC 
GAGGCCGGCC 
ATGGCGGGCA 
CCCTGCTGGC 
CACAC CAGTA 
CGGCACCTTG 
CTCTCTGGTG 
CTCGTCTCCC 
CCTGGAGCAT 
CGGAGTCGCT 
GGCGAGGATC 
GGTGCCCGAC 
CCGGAGGCGC 
GGTGCCGTCC 
ACGGTGGCTC 
GCCCCGCCAA 
GCCGCCGAAC 
GACAGACGTA 
CCGGCGAGCT 
GCCGTGTCCG 
GGAGACCGTG 
AG CACGGCGC 
CGGGCCGGTG 
CATCTGGGAT 
TGTCCCGGAG 
GCCCTGCGCG 
CATCGAGTGG 



714 01 GAGCGGTTCG CGCCGCTGTT CACGCTGGCC AGGCCCACCC GGCTGCTCGA 
71451 CGGGATCCCC GCGGCCCAGC GGGTCCTCGA CGCCTCCTCG GAGAGCGCCG 
71501 AGGCCTCGGA GAACGCCTCG GCCCTCCGTC GCGAACTGAC GGCCCTGCCC 
71551 GTGCGGGAGC GGACCGGGGC ACTTCTCGAC CTGGTCCGCA AACAGGTGGC 

716 01 CGCCGTCCTG CGCTACGAGC CGGGCCAAGA CGTGGCGCCC GAGAAGGCCT 
71651 TCAAGGACCT GGGCTTCGAC TCGCTCGTGG TCGTGGAGCT GCGCAACCGG 

717 01 CTGCGCGCCG CCACCGGGCT CCGGCTGCCC GCCACCCTGG TCTACGACTA 
71751 CCCCACACCC CGCACCCTCG CCGCACACCT GCTGGACAGG GTGCTGCCCG 

718 01 ACGGCGGCGC GGCAGAGCTC CCCGTGGCCG CCCACCTGGA CG AC CTGG AG 
718 51 GCGGCCCTCA CCGACCTGCC GGCCGACGAC CCCCGGCGCA AGGGCCTGGT 
71901 CCGGCGTCTA CAGACGCTGC TGTGGAAGCA GCCCGACGCC ATGGGGGCGG 
71951 CGGGCCCCGC CGACGAGGAG GAGCAAGCCG CGCCCGAGGA CCTGTCGACC 
72 001 GCGAGCGCCG ACGACATGTT CGCCCTGATC GACCGGGAGT GGGGCACGCG 
72 051 GTGAGCGGGG TGGAGCGGGG TGTGGGGTCG GCGGGCCCTG TGGAACAGGG 
72101 TGACGGACTC GCGGGCCTGG TCGAGCGGGC CGAGGCGCTG GCCGCTCTGC 
72151 GGGGCGCCTT CGACGGCTCC CCGGGCACCG GCGGCAGCCT CGTCGTGCTC 
72201 AGCGGCGCGG TGGGCACCGG CAAGACCGCG CTGCTACGGG CGTGGGCCGA 
72251 CCGCATCGGC GCCGATGCCG ACGCCCTGGT CCTGACCGCC ACCGCCTGCC 

723 01 GCGCCGAGCG CGACCTGCCG CTTGGCGTCC TGGAACAGCT GGTACGCAGC 
72351 CCCGGCCTGC CCCCGGCCAG CGCCGAGCGC GCGCTGGCGT GGTGGGACGA 

724 01 GGAGGCCTCG GCCACCCCCG GAAAGACGGA CGCGAACGGG ACGAGTGCCA 

724 51 ACGGGACGGA CGCCAACGGG ACGGGCGCGG G AC AG AC GGG CGCGGGGCAG 

725 01 GCGGGCGTGG GACAGACGGG CGTGGGCGGA GAGCCCGTCC TGGCCGCCTC 
72551 CGCCCTGCGA GGCCTGTGCG AGGTGCTGCG GGACCTGCTC GCCGAGCGGC 

726 01 CCGTCGTGGT CGCCGTCGAC GACGCGCACC ATGCCGACGC GGCGTCGCTC 

726 51 CAGTGCCTGC TCTCCGTGGT GCGCCGGCTG CGGTCGGCAC GACTCCATGT 
72701 GCTGTTCACC GAGTACGCCC ATCAGAAGGC GCAGAACGCC CTGCTGAGCA 

727 51 GCGAGTTCCT GCACGAGCCC GCCCTGCGGC GGATCCGCCT GGAACCGCTG 
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728 01 TCGAAGGCGG GCGTGGAGGC CTTGCTCGCC CGGCACCTCG ACGAGCGGAC 

728 51 GGCACAAGAC CTCACCCCCG TCGTCCACGG CATGAGCGCG GGCCACCCGC 

72 901 TCCTCGTACG GGCGCTGGCC GAGGACCACC GTG CGGCGGG CGGCGCCGGG 

72 951 GAGGCGTACG GTCGTGCCGT CCTCAGCTTT CTGTACCGGC ACGAGACTCC 

73 001 GGTCACCCAA GTCGCCCGCG CCATCGCTGC GTTGGGCGCG CACGCCGGAC 
73 051 CCGGTCAGGT CGGGCGGCTG CTCGATGTCG ACGCGGCGTC CGTCGAGCGG 
73101 GCCGTGCGGC AGCTGACCGT CGCGGAGGTG CTGCACGAGG GCCGCCTGTG 
73151 CCACCCGGCG TTCGCGGCGG CGGTCCTGGA CGGCATGCCG CCCGAGGAAC 
732 01 GCCGCGCCCT GCACGGACGG GTCGCCGACC TCCTGCACGA GGAGGGGGCG 
73251 CCGGCCACCG AAGTGGCCGC CCACCTCGTC GCCGCCGACC GGTCCGACGC 
73301 CCCGTGGGCG GTACCCGTCT TCCAGGAAGC GGCCCAACTC GCCCTGGACG 
73351 AGGACCAGGT GGAGACCGGC GTCGACTATC TGCGCGCGGC CCACCAGCGG 
734 01 TGCCGGGGCG CCGCGCAGCG TGCCGCGGTC GTCGGTGCGC TCGCCGACGC 
73451 CGAGTGGCGG CTCGACCCAG CAAAGGTCCT GCGCCACCTG CCCGACCCTG 
73 501 CAGCCATGGC CCCACAAACG GACCCTGCCG CCCTGGCCCC ACACACGGAC 
7 3 551 CCCGCACCCA CAGCCGCACC CACAGCCGCC CCCACCCCCA CCCCCATCCC 
73601 GACCACCCCA CCCCTCCCCA CCCACCTGCT CTGGCACGGG CGGGTCGAGG 
73 651 AAGGCCTGGA CGCCATCGGC ACGCTCACCG GGCCCGGACC CAACCCGGCG 
73701 GGTGCGCCGC CGATGAACCC CGCGGACCTG GACACCCCAT GGCTGTGGGG 
73751 CGCCTACCTC TATCCCGGGC ACGTCAAGGA GCGCCTGGGA TCCGGCGCCC 
73 8 01 TGTCCCCGCA GCGCTCGACC CCGCCGGCGG TCACGCCGGA GCTCCAAGGC 
73 851 GCGGGCACGC TGATGAACGA CCTGCTGCAC GGCGGCGAAC GCGACGCCAC 

73 901 CGAGGCCGCC GAGCGCGCCC TCAAC CGCTA CCGGCTCGGC CCCCGCACCA 
7 3 951 TCGCGGTCCA GACGGCCGCG CTGGCCGCCC TCACCTACCG CGACCGGCCG 

74 001 CACCGCGCGG CCGCCTGGTG CGACGGCCTC GTCGCCCAGG CCGACGAGCG 
74 051 CAACAGCCCC ACCTGGCGGG CCCTGTTCAC CGCGTGGCGT GCCCTGCTCC 
74101 ACCTGCGGCA GGGCGACCCG GCCGCAGCGG AACAGCGCGC CGAAACCGCC 
74151 CTCGCCCTGC TCGGATCGAA GGGCTGGGGC GCCGCGATCG GCCTGCCGCT 
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742 01 GGCAGCCGCC GTACAGGCCA AGGCGGCCCT CGGCGATGTC GACGGGGCGG 
74251 CGGCCCTCCT GGAACGGCCC GTGCCCCAGG CGGTCTTCCA GACCCGCACC 

743 01 GGACTGCACT ACCTGGCGGC CCGGGGCCGC TATCACCTCG CCACCGGCTG 

743 51 CCACTACGCC GCACTGTGCG ACTTCTACGC CTGCGGGACC CGCATGAGCA 

744 01 GCTGGGGAGT GGACCTGCCC GCGCTGGAGC CGTGGCGCCT CGGCGCGGCG 
744 51 GAAGCGTACC TGGCCCTCGG CGAAGGACTC CTGGCACGCC AACTCGTCGA 
74 5 01 CGGCCAGCTG CCGTTGCCCA CGCCTGACGA CGGCCGCACC TGGGG CATG A 
74 551 CGTTGCGCCT GCGGGCGGCC ACGTCCCCCG CGCCGGCCCG GGCCGAACTC 
746 01 CTCGACGAGG CCGTGGCGGT GCTC CGGGAG AGCGGCGACA CCTTCGAGCT 
74651 GGCGCGGGCC GTCGCCGACC AGGCTGTTGC CGTACGCGAA GGGGGCGAGG 
74 7 01 CGGAACGCGC CCGGCTGCTG GCCCGCAAGG CGGAGCTGCT GGCCCGGCGC 
74 751 TGGGGCAGCG CCCCCGCGCC CGCCACCGTC CCCGAACCGC CGGAGCGGCC 
74801 AGGACCGGCC ACTCCGGACG C CGAACTG AC CAGTGCGGAG CGGAGGGTGG 
74851 CCGAGCTGGC CGCCGAAGGG TTCACCAACC GGGAGATCTC CCGGAAGCTG 
74 901 TGCGTCACGG TCAGCACCGT GGAACAGCAC CTGACCCGGA TCTACCGGAA 
74 951 GCTCGACGTC AGGCGACTGG ACCTCCAGGC AGCCCTCGGC TGACCTTCAG 
75001 GCGGCCCTCG GCTGACCGCA GGCCACGCGC CTACGGTCAG CCTTCCTGAG 
7 5 051 TCAGGACCGT ACAGCCGCCG TAGGTGTAGG TGTAGGCGTG GGCGAGATCG 
75101 TCGCCGCGTC CAGACCCACC ACGGCCAGCT CCTCCGGAAG GAACGGGGGA 
75151 GCGGTCAGCT CCGGGAGGCG TTCGTCGGCG CGCATCGCCA TCAGGAAACG 
75201 GTTGGAGCCC AGTTCGGCCT GCGGCGCGTT GAGGCTCATC ACGTCCGTGA 
7 5251 CGATCTCGGA CGCCTTCGGG GAACGGATCG ACGCCGCGGT GATGGCCTCG 

753 01 GCGAACCGCA GACGCTGCTC GGTGTC CAC A CCGATGAGCC GCGGATCCGT 
7 5351 CGCCGAGACA CGGCAGTTGA CGTAGTCGAT GTCCTTGGTC GCGGCGAGGA 
75401 TCCACGGGTC GTCCACGGCC GCGCCGATCG CCTTCTGCAG GGCGCGGGTG 

754 51 CCGGCGCGGG CGGACCCCGT ACCCTCCTGC ACGCTCCGCT CGAACTCGCG 
7 5501 GTCGATCGTG GTGGCGCAGC GCGCGGCCGA GCTCATGCCG TGGCCGTAGA 
75551 TCGGGTTGAA AGCGGTCAGC GAGTCGCCGA TGACGAGCAG ACCGTCGGGC 
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7 5601 CACTGTTCGA GGCGCTCCGG ATAGAGGCGG CGGTTGGCGC CGGAGCGGGA 

756 51 ACCGAAGACG GGGGTGAGTG GTTCGGCGTC CCGGAGCAGG TCGGCGAGGA 

7 5701 TCGGGTGGTT CAGGTTCTCG GCGAAGGGGA TGAACTCGTC CTCGTGTGTG 

7 5751 GGCAGTTGCG CGCCCCGCGT GCAGGAGAGC GTCGCGAGCC AGCGGCCGCC 

758 01 CTCGATGGGG TAGACCACGC CGAAGCGGCC GGGTTCGCGC ACCCGGTCGT 

758 51 CGGCGGCGAT GTTCACGGCG GGGAAGTGCG TCGTAGCGCC CGGCGGGGCC 

75 901 TTGAAGAGCC GGGTGGCGTA GGCGACGCCC GCGTCCACGA CGTCTTCCTC 
75951 CAGTGCCGGC ACGCCGAGGG CGGCGAGCCA CTGCTTGAGG CGGGAGCCGC 

76 001 GCCCGGTGGC GTCGATCACC AGGTCGGCCT CCAGCTGCTC CTGCCGACCG 
7 6 051 CTGTCGAGGT CGCGGACGAC GACACCGGTG ACCCGGCCGC CACTGCCACC 
76101 ACCACTTCCC GTCAGCTCGA CGGCCTCGGT GCGCTGCCGG ACGGTGATGT 
76151 TGTCGGCTCC CAAGGCCTGC TGACGTACCG TCAAGTCCAG CAGCGGGCGG 
7 6201 CTGGCGACCA GCGCGAACTG GGTGGCGGGG AAGCGGTGCT GCCACCCCTG 
76251 ACCGGTCAGC GTCACCAGGT CCTCGGGGAA GCCGAGGCGG CGGGCGCCGG 
7 6301 CCGCGAGGAG GCGGTCGGTG GTGCCGGGCA GCATCTCCTC GATGAGGCGG 
763 51 GCGCCGTTGG ACCACAGGAG GTGCGCGTGG CGGGCCTGCG GGACCCCCTT 
7 64 01 GCGGTGCTGG GGCTCCTCGG GCAGCGCGTC ACGTTCCACG ACGGTGACGG 
7 6451 CGTCGACGTG CCGGGCCAGG ACGTGGGCCG CCAGGGTGCC TGCCATGCTG 
76501 GCAC CCAGGA CGACGGCATG TGCGGGTCGG GTGGTGGTCA CGCGCGTATC 
7 6551 CCTTCGGGGT GGGTGGTGTC GGCGGGCCCG GCCGGATCGT C C ATGGTC AC 
766 01 GTCCGTGACG CCCCAGAACG CCTGGACCCG GCGGCCGAGC CCGTGCTCGT 
76651 CGAGTTCGAC GATGCCGACG ATGCGGAAGG TCATCGGCCG CGGCCGCTGC 
76701 ACGGTGACCG TGGTCGGCGT CACCACGAAA CGGTCGTCCA TCGACGTCAT 
7 6751 CGGCGGGTCC GGCACCTCGT GCGTACCGCA GGAGACGGCC AGTTCGAGAT 
76 8 01 GGCGGCGGAG ATCGTCCTTG CCCACCATCG GGGGCCGCCC CACGGGGTCC 
768 51 TCGAAGACGA TGTCGTCCGT GAACAGGTCG AGGACGCCTT CGATGTCACC 
76 901 GGCGTTGATG CGCTCGGCGT AGTCGACGGC CATCTGCTTG CGCGCGGCCT 
7 6 951 CGTCGGGCAT GGCACCTCCA GGAAGGGTGG GCAGACCTTG TGAAAGTCAT 
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77001 CGAGGGCCGT TCGGTTCAGC CGAGGACCGT GAGATCGGAT GTGCCCCAGT 

77051 ACGACTTCAG ATGCCGGATG AGGCCGGACG CGTCCATGCG GATCACGAGC 

77101 ATCGCCGTGC GGTGTATGCG GGCCGTCCCC GGGGCGTCGG GGGCCTTGAG 

77151 CCAGCCCCGC TCCGCGTAGA GCGGGCCCAC GGG CAGGTAG TCCATGACGG 

7 7201 AGGAAATCTG GATCAGCGCG TGCGTGGCGT CCTGCCCGGC GACGGGCTCG 

772 51 GCCGCCTCCT CGCGCAGGTG CGCGGCGAGC AGCGGTTCGT AGTGGGCGCG 

773 01 CAGCGCGTCG TGCCCGGTGA CGGGCGGGAG GCCGACCGGG TCCTCGAGGA 
7 7351 CCGCGTCGGG CGCGTACAGA TCGATGATCG CGTCCAGGTC CCCGGCGTTG 

774 01 ATCCGCCGGC TGTGCTCCAG GGCCCGCTTC TTGCGGGCGA ACTCGTTCAT 
77451 CGCTGCCCCT CCACTGCCTG ACCGTGTCCG TTGCCGTTGC CGTTGCCGTT 
77501 GCCGTTGCCG TGTCCGTTGC CCTGCCCGGT GGGCTGTCCG TTGCCCTGTC 

775 51 CGCTCGCGCC GTCCCTGCCG AGGTCCCGGT CGATGAACGC GAAGATCTCG 

776 01 TCCGCCGACG CGTCCTGGAT ACGTGTACGA GTGGCCACCG CGACCTCGCC 

776 51 GGCCGTGTCC TGCGGCGCGT CGAGCCTGGC CAGCGTCGCG CGCAGCCGCC 

777 01 CCGCCAGTTC GGCCCGCGCC GAGCCGTCCT TCGAGGAGAC CG AG AG C AG C 

777 51 GAGTCCTCGA TGCGCTCGAA CTCCGCCAGG ACGTCGGCGA GCGGATCCGC 

778 01 CGCGCGCGGG GCCAGCTCCT GCCGCAGCTG CGCGGCGAGC TCCGCCGGGT 
7 7851 TGGGATGGTC GAAGACGAAC GTGGCGGGCA GCTTCAGCCC CGTCGCGGCC 
77901 GAGAGCCGGT TGCGCAGCTC CACCGCGGTC AGGGAGTCGA AGCCGAGTTC 
77951 CCGCAGCCCC TGCGTCGCGT TGACGGGCGT GGCCGCGTCG TAGCCGAGGA 
78 001 CGGCCGCGAT ATGGGTG CAC ACCAGGTCGA GCAGCGCCTC CTCCCGCTCG 
78 051 GGGTCGGACA TCGCGCCGAG CGACTTGAGC AGCGCGGCCG CCCCCGCCGA 
78101 CACGGCACCG CCGCCGCTCT TGCTCCCCCC GCGCACCAGG TCGCGCAGCA 
78151 GCGCCGGTGC GGGGTGGCTC TGGGCCTGCC GGCGCATCCG GGCCAGGTCC 
782 01 AGACGGACCG GCGCGTACAG GGGCAGTCCG CCGGCCCACG CCGCGTCGAG 

782 51 GAGGGCGAGT CCCTCGTCGG CGCCGAGCCC GACCACGCCG GCGCGGGCAT 

783 01 GGCGCGCCCG GTCGGCGTCG GTGAGCCGTC CCGACATGCC GCTCGCCAGC 
783 51 TCCCAGTAGC CCCACGCCAG GGAGGTCGCC GCCGCACCGC CGTCGTGCCG 



7 8401 GTGCCGGGCC AGCGCGTCCA 

7 8451 GGCCGGGGCC GCCGAGCAGC 

78501 GACAGGTCCG CGTCCCGCGT 

78 551 CTTCACGCGC ATCACCTCCT 

78 6 01 CGGCGTCGTT CACGGTGCCC 

786 51 TCCGAGGGCA CCGCCGCGAG 

78701 GTCGCACGCG GCGAAGGTGA 

78751 C CAGTTCG AG TGCGCCCGGC 

788 01 AGGTGCCTGG CTCCGTACCG 

788 51 GAGTGCTCCG GTGCCGCCGG 

78 901 CGGGAGGCAG CGAGAACACG 

7 8 951 GCGGCGGGCG CCTGCCGGAT 

790 01 AGCACGGCTG TCACCCCGTT 

79051 TGATCTCGGC AAGCTCGGTC 

79101 ACGCGCCCGG GCTCGGGCGG 

7 9151 GCCCGCACGG TGG AC C AC C A 

79201 CCGCGCGCTC GGCGCCCGAA 

7 9251 TCGACCCGCC ACCGCCCGGC 

793 01 GGAAC CGGTT TCCTCCCCGA 

7 9351 CGACATCCGC CAGCACGTGA 

79401 GCCGCCTGCG CCCAACTCCA 

79451 ATCGGTGACG GCCACCGGGC 

7 9501 CGCCCACGGC CACCGAACCG 

79551 GCACCCTCGA CCTGGCCCGC 

7 9601 CGCCACCCGC ACCTCGTGCG 

796 51 C G ACAAGGG A CAACTGCTGT 

7 9701 CGCCACGTGA GCGATCCGAC 

79751 GCGCACGAGC CGTGGCACGT 



AGAAGG CGTT GGCGGCCGTG TAGCTGCCCT 
CCGGCGACCG AGGAGTACAG GACGAACGCG 
CAGCTCGTGC AGGTGCCACG CGGCGTCCGC 
CGACCTGCTC GGCCGTGAGG TTCTGCACCA 
GCGCAGTGGA AGACGGCGGT CAGCGGGTGG 
GAGGGCGGCG GCTTCGTCCC GGTCGCCCGG 
CTCGCGCGCC GAGCGCGGAG AGGTCGGCGG 
GCGTCGGCTC CCCGCCTGCT GGACAGCAAC 
TTCCACCAGG TGACGGGCCG TCAGCGAGCC 
TGACCAGCAC GGTGCCCTCG GGGTCGAAGG 
GTCGTGCCCG CCGAGGGCGG GGCCGCCATC 
GTCCCACACG GTGATGTCGA GCGGCGTCAG 
CCGCGGGCAG CCCGGGCTCC GCCGACTCCG 
AGCTCGGTCA GCTCCGCGAG GATTTCCCGT 
CACGACAGCC TGTCCCTCGT CCGGACGACC 
GGGCCCCCTC GTGGCGGAGG GTGACGTCGG 
TCATCCGCCG TCGACGCACC GTCCACGGCC 
AAGAGCAAGA CGCAGCACGG CACGGCCGAC 
CGAGCAGAGT CTCGCCGCCG CGCGGCGCCA 
TACGCGGACA CATAGGCCCC CAAGGACCCG 
GCCCGCCGGA ACCGGCATAA GCAGCGCGGC 
CCACCGCGTC GAACAACCCC ATCACCCGGT 
ACCTCGCCGC CGACTTCCGT CACCACACCG 
CGTGAGCGGC CCCGGCGCCG CGGCCCGCAC 
GCTCCAGCGC CCGTCCGGCC TCGGGAGCGT 
CCGCCGCCCG CCTCTTGGCA CCGAGCCAGC 
CGGCGGCACC AGCCGCACCG ACGCGTCGTC 
AGGCGCGCCC GTCACGCAGC GCCAATTCCG 



7 9801 GTTCGCCGGA GGCCAGTACG 

798 51 CCGTCCACGT CGAGCAGCGT 

7 9901 GCGCACCAGA CCCCACAGCG 
79951 CCGGCCGCGC GGCGACCGCG 

8 0001 GCGAACGCCG GGTCGTCCAC 
8 0051 GGTGGCCAGC CGCGCGTACC 
8 0101 CCGCAACGGC CCCGGCACCT 
8 0151 ACATCGGGCG CTTCGCCCCC 
8 0201 CGTGTCCCAC ACGGGGCCGG 
80251 GCGCACCGGC CGACGTACCG 
8 0301 ACCGCGGCAC GCGGGGCGCC 
8 0351 CTCCATCCAC ACGAGCCGGA 
80401 CCGCGATCTG GTGGGCGGCC 
8 0451 AGAACCGGCT CCCCGCCTCC 
8 0501 GTCGGGCGCG GTGCGTGCGA 
8 0551 CGTGCACCCG CAACCCGCTC 
8 0601 TCGGCGGACG ACGTGACCGC 
8 0651 GAGCAACACC GGGTGCACCT 
80701 CGGGCAGCGC CACCTCGGCG 
8 0751 ACCAGTCCCT GTGAGCCGGG 
8 0801 CCCGTACGGA TCCTGCTCGC 
8 0851 TCCCGCCGAA CGAGGCGTCC 
80901 CCCGCGGCAT GCCGGGTCCA 
8 0951 CGAATGGACG GTCACGGGAC 
81001 CCACCTGCAC GTCGACCGCG 
81051 GTGTGCAGCG TCAGCTCCGC 
81101 CTGCAGCGCG AGCTCCACGA 
81151 TGACCCGGTG CTCGGCCAGC 
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CCGGTCAGCG TGGCCGGAGA AGACTCCAGT 
GAGGCGACCG GGATTCTCGG CCTGCGCGCT 
ACGCGCCCGC CAGATCACCG GCGGTCTCAC 
CCTCGGGTGA CGACGACGAG ACGGGTCGCC 
CCACTCCTTG AGCAGCGACA GAAGGGACAC 
CGGCCGGGTC GCCGCCCCTG C CATCGG CAT 
GCGCCGGGCG CGGCGCACAC GGCGAGCACG 
AGCCGCCACT CCGTCCCGGA GCGCACCGAA 
CGGCCAGCGC ATCGGACAAG GCGTCGGCCA 
CCCATCGGGC CACTCTCGAC CGGCGCGAGG 
GCCGCCCGTC TCCTCGGCCC GCGCGGCGAC 
ACAGCGCGTC ACGGTCCGCC GCACGGGCGC 
ACCGGCCGTA CCGTGAGCGA CTCCAGCGTG 
GCCCCCGTCC ACGGCCGTGA GGGCCAGCTG 
TACGTACCCG CAACTTCTCA GCGCCGGGCG 
CAGGAGAACG GCAG CAGCAC TTGGTCGGTG 
GTCCAGGATC AGCGCGTGCA GCGTGGCGTC 
GGTAGCGGTC GGCCCTGCCG CTCTCCGCCT 
AAAAGGTCGT CCCCGAGCCG CCACGCGCTC 
CCCGAAGTCA TAGCCGTACG AAGCGAGTTC 
CGACCGGTGT GGCGCCCGGG GGCGGCCACG 
CCGGCGTCGG GCCCCGGGGG AGCGACCACG 
CACGGCCTCC TCGCCCTCAC CCGTGGGCCG 
GCCGCCCGTC CTCGGCCACG GAACCGACCA 
CCCGCACCCT CGTCCCCGAA GGCGAGCGGA 
CAACTCCGCG CAGCCGGCCC GCACCGCGGC 
ACGCCGAACC GGGCAGCAGC ACCGTGTCCA 
CACGCCTGGT CCCGCGGAGA GATCCGGCCG 
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812 01 GTCAGCAGGT GACTGCCGCC GTCCGCGAGT TCCACGGCGG CTCCGAGCAG 

812 51 CGGATGCCCC GCGGACGCGA GCCCGAGCCC CGCCGGGTCC CCGGCGAGCC 

813 01 CCCTGCGCCC CTCCAGCCAG AACCGCTCCC GCTGGAAGGC GTACGTCGGC 

813 51 AGATCCACCA CCCGAGGCAG CGGCACGGCC GGGAACCAGC CCGTCCAGTC 

814 01 GACCTCCGCC CCCGCGCCGA AGGCCTGGGC GGCCGCGCGG GTGAGCTGCG 
814 51 CGGCGTCGCC GTGGTCGCGG CGCAGGGTGG GCACGACGGT GGCGGGCATG 
81501 TCGGCCCGCT CGATGGTCTC CTCCATGCCG AGGTTGAGGA CGGGGTGGGG 
81551 GCTGGCCTCG ATGAACAGGC GGTAGCCGTC GGCCAGCAGC GCTTCGATGG 

816 01 TGTCGGCGAA GCGGACGGGC TGGCGGAGGT TGGTG AC CCA GTAATCCGTG 
81651 TCGAGGGTGG TGGTGTCGTC GAGGCGTTCG GCGGTGACCG TGGAGTAGAA 

817 01 GGCGACGTCC GTGGTCGTGG GCCGGATGTC GGCCAGGCGC TCGGTGAGGA 
81751 GGTCGTGGAG CTGGTCGATC TGGGGGCCGT GGGAGGCGTA TCCGACGTCG 

818 01 ATGACGCGGG CGCGCAGGCC TCGCGCCTCC GCATCCGCGA CCACGGCTGC 
818 51 CACATGCTCC GGCGGCCCTG AAATGACCGT AGAGGAGGGC CCGTTGACGG 
81901 CAGCGACACA CACGCCGGGC CGGTCGCCGA TGAGCTCAGC AACCTGCTCC 
81951 GAGCCGGCCC CCAACGACGC CATGTCGCCC TGCCCCATGA GCTGACGGAG 
82 001 CGCGTCACTG CGTACGGCCA CGATCCGCGC CGCATCCTCC AGTGACAGTG 
8 2 051 CCCCCGCCAC ACACGCGGCG GCCATCTCGC CCTGCGAGTG CCCGATGACG 
82101 GCAGCCGGGG TGATGCCGTA ATCGGCCCAC ACCGAAGCCA GCGAGAC CAT 
82151 CACCGCCCAC AACACGGGCT GCACGACCTC GACCCGGGAC AGCTCACTCC 

822 01 CGTCCCCGCG CAACACCGCA CTCAGCGACC AGTCCACATG CGCCGACAGG 
8 2251 GCCCGCTCAC ACTCCGCGAT CCGCGCCGCG AAGACGGGGG ACTCGTCAAG 
82301 G AGCTGGG C A CCCATGCCCA CCCACTGCGA CCCCTGCCCC GGAAACACCA 

823 51 ACACCGGACC CGCGCCGGAG GCGCCCTGTA CGGCGCCCTC GACGACGTCC 

824 01 GGTGACGGCT CGCCCGCCGC CAGGGACCGT AGCCCGGCGA GGAGAGTCTG 
82451 GCGGTCCTTG CCCACGACGA CGGCTCGGTT CTCGAACACC GACCGGGTCT 

825 01 TG AC CAGGG A CCAGCCCACG TCCAGCGGCG ACGCGAGCCG CGGGTCGGCG 
82551 GTGGCGCGGT CGGCCAGCAG GCGGGCCTGG GCCCGCAGCG CCTCCTCGCC 
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82601 


GCGCGCCGAC 


ACCACCCAGG 


GCACCACTCC 


1 GGCCGGCGCC 


: GCGGCGTCCT 


82651 


CCGCCGGAGC 


GGTCACGGGC 


TCCGGCGCGT 


1 CCGGGGCCTG 


TTCCAGGATG 


82701 


AGGTGCGCGT 


TGGTGCCGGA 


. GATGCCGAAG 


GCGGACACCC 


CGGCGCGGCG 


82751 


CGGGCGTTCG 


CCGCGCGGCC 


AGGAGACCGG 


TTCGGACAGC 


AGGCGGACGC 


82801 


CACTGCCGTC 


CCAGTCCACG 


TGCGGCGTGG 


GCGCGTCGAT 


GTGCAGGGAG 


82851 


GCGGGCAGCT 


GTTCGTTGCG 


CAGCGCCATG 


ACCATCTTGA 


TCACACCGGC 


82901 


GACACCGGCC 


GACGCCTGCG 


CGTGCCCGAT 


GTTCGACTTG 


ATCGAGCCGA 


82951 


GCCACAGCGG 


CCGGTCCGCG 


GGCCGCTCCT 


TGCCGTAGGT 


GGCGACGAGC 


83001 


GCGCTGGCTT 


CGATGGGGTC 


GCCCAGCATG 


GTGCCGGTGC 


CGTGCGCCTC 


83051 


CACCGCGTCG 


ACGTC CTCGG 


CGGAGAGCCG 


CGCGTTGGCG 


AGTGCCTGCC 


83101 


GGATCACCCG 


CTGCTGCGCC 


TGCCCGTTGG 


GTGCCGTGAG 


CCCGTTGCTC 


83151 


GTGCCGTCCT 


GGTTGATGGC 


CGAACCCCGG 


ATCACCGCCA 


GGACGTTGTG 


83201 


GCCGTTGCGC 


CGGGCCTCCG 


AGAGCCGTTC 


GAGTACGACC 


AGGCCGACTC 


83251 


CCTCGGCCCA 


GCCGGTGCCG 


TCGGCGGCGG 


CCGCGAACGG 


CTTGCACCGG 


83301 


CCGTCCTTGG 


CGAGCCCGCG 


CTGCAGCGAG 


AACTCGACGA 


ACGAGCCCGG 


83351 


CGTGGCCATC 


ACCGTCGCGC 


CGCCCGCGAG 


AGCG AG CGAG 


CACTCGCCCT 


83401 


GGCGCAGCGC 


GTGCGCCGCC 


TGGTGGATCG 


CCACCAGGGA 


CGAAGAG CAG 


83451 


CCGGTGTCGA 


TCGTCATGGC 


GGGGCCTTCT 


AGGCCGAGTA 


CGTACGACAC 


83501 


CCTGCCGGAG 


GCGACACAGC 


CGAGGTTGCC 


GGTGCCGATG 


TAGCCCTCGA 


83551 


CCTCGGTGGG 


CTGTTCACCG 


ACGAGCGCGA 


GGTAGTCGAA 


GATGGTCAGG 


83601 


CCCGTGAACA 


CCCCGGCGTC 


GCTGCCCTTG 


AGGGTCTCCC 


GGTCGAGGCC 


83651 


CGGGCGTTCG 


ATCGCCTCCC 


ACGCGGTCTC 


CAGGAGC AG C 


CGCTGCTGCG 


83701 


GGTCCATCGC 


GACGGCCTCG 


CGGGGGCTGA 


TGCCGAAGAA 


TCCGGCGTCG 


83751 


AAGTCGCCCfi 










83801 


GCGGCTCTCC 


GGGTCCGGGT 


CGTACAGCGT 


CTCCAGGTCC 


CAGCCCCGGT 


83851 


CGTCGGGGAA 


GGCCCCCATG 


GCGTCCTTGC 


CGGCCGCGAC 


CAGATCCCAC 


83901 


AGCTCCTCGG 


CGGAGCGGAC 


GTCGCCCGGA 


TAGCGGCAGG 


CCATGCCGAC 


83951 


GATCGCGATC 


GGCTCGTCGT 


CGGCGGCGCC 


CCTGGAGGCC 


CCGGCCGCCC 



84001 
84051 
84101 
84151 
84201 
84251 
84301 
84351 
84401 
84451 
84501 
84551 
84601 
84651 
84701 
84751 
84801 
84851 
84901 
84951 
85001 
85051 
85101 
85151 
85201 
85251 
85301 
85351 



GCACCGGGTC 
ACGTCGGTGA 
CAGTGTCAGG 
TCAGCGAGTC 
TCGACGTCCG 
CAGCAGCAAC 
GGAGCGACGA 
GCCCGCCGGT 
GGTGAACGTC 
TCGTCTCACC 
GGGCTGAGCG 
GTCCGCCGCC 
CGGGCAGGCC 
TTGGCCGCAC 
CGACGAGAAC 
GCAGATGGTG 
GCACCCGACA 
GATGACGGCG 
GCACCTGCTC 
CCCAACTCCG 
GCCGCGGCGG 
ACCGCGCGAG 
GTGCCGTCGG 
GTGTGCAAGG 
CCTCGTCCTG 
TCGATACGAG 
CAGCGCAGCG 
TGAGGGCGTC 



GGCGGAGGCC 
GCGCGTCGGG 
C CGGTG CTCT 
GAAGCCCAGC 
AGTGCCCCAG 
TGCCGCTGCT 
TGCCTCCGTG 
TCTCGGGCAG 
GACGTGAACT 
CGCGTCCAGG 
GGTGCAGGCC 
ATGCCCGCCT 
CTCGGCGCGG 
CATAGGCGCC 
ATCACGAACG 
AGCGGCGAGC 
CGTCGCCGAG 
GTCAGCGGGT 
GCGGTCGCCG 
TCAGTTCCGC 
CTGGTCAGGA 
GACGGCGCCG 
GCCGCCAACC 
CGACGGGCAT 
CGGAAC CAGC 
CGGGCAGATC 
ACGCGCCCCA 
GCCGGCGCTC 
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GCCGCGTCAC 
GGTGGGGTGG 
TGTTCAGCCT 
TCCTGGAACG 
CGTGGCCGCC 
GCGCCGGCTT 
GCGTCTTCCT 
ATCGGCGAGG 
GCGCCCAGTC 
GCCTGCTGCA 
GAAGCGGCTG 
CGGCCCAGGG 
CGGTG CTCGG 
CTGCTGGCCA 
CCGAGAGCGG 
GCCTTCGGAC 
ACCGATGTAG 
GCTCGGCGGG 
ACGTCGCAGG 
GGCGAGTTCC 
GGAGGTGCGG 
ATGCCGCCGG 
AAGCCCGCTG 
GGACGCCGGA 
GCGGCGGCCA 
GACCAGCCCG 
GCCCCCACAC 
GTGGAAACAG 



CGGACAGCTC 
TCGAAGACGA 
GTTGCGCAGC 
GCTTGGTGGC 
GCCTGGGAGC 
CGCCTCCGTC 
GCTGTGCCGC 
AGCGGGCTGG 
GAAGTTCGCC 
GCGCCTTGAC 
AAGAACGTCA 
CCCCCAGGCG 
CGAGGGCGTC 
CTGCCCCACA 
CAACTCCCGG 
GCAGCACCTC 
TTCGGCACGC 
GACATCGTCG 
CGGTGACGGT 
TGTGCTCCCG 
GGCGCCCGCA 
TCCCGCCGGT 
CCGACCGTGT 
CGGCCGGATG 
GCCGGGCCAG 
CCCCACAGCC 
CTGAGCCTGC 
CCCCCTGCGT 



GGCCCGCAGG 
CCGTGGTCGG 
TCCACCGCGG 
GGGCACCGCG 
GCACGTGCTG 
AGCTCCTGCT 
GGGTGCGCTG 
GCCGCTGCGC 
ACGGTCAGCG 
GCACAGCTCC 
ACGCGGCCTG 
ATGGAGGTGG 
GAGGAAGTGG 
CGCCTGCGCC 
GTCAGTTCAT 
GTCCAGCTCG 
CGGCCGCGTG 
ATGAGGCGTC 
GACGGCGGCC 
GGGCGTCGGG 
CGGGCGAGCC 
GATGAGAGTG 
TGGCGGGCGC 
GAGATCTGGT 
CGTCTGATGG 
GCGGATACTC 
ACCGGGTGGG 
GAGAGTGCGT 
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85401 ACGGCGATGT CGGCGCCGTT GTCCGCGAGG GCCTGGACGA GAGCGGTCGT 
8 5451 CGCGGCGAGT CCGGCGGGCA CGGCCGAGTG CTCGGGATGC GGCTCCTCGT 
85501 CCAGGGCCAG CAGATTGACG ACTCCGGCAA ACGCGGCCCC GTCCATCAGG 
8 5 551 ACACGCAGCT CCTGCGCCAA CTCCGTACGC TCCATGGCAC GTGCGTCGAC 
8 5601 CACGTGGCGT CGCACCTCGC CACCATGGGC GGTCAGCGTC TGCGCGGTCG 
8 5651 CGAGGACGGC CGGGTGGTCG GCGTGCGCGG CGGGCACGAG C AG C AG C C AG 
85701 GCCCCGCTGA GCTCCGGCGC CGGCACGTCG GGCAGATGCT TC CAAGTG AC 
8 5751 CTGATAACGC CAGGAGTCGA CGGTGGACTG CTCGCGGTGC CGACGCCGCC 

8 5801 AGGC CGAG AG GACGGGCAGC GCGGACTCCA GCGCTCCGAC GCTCTCCGCC 
8 5851 TGCCCCTCGA TCTCCAGACT GCCGGCGAGG GCGTCGATGT CCAGGTCCTC 

8 5 901 GATCGCCTGC CACACCCGGG CCTCGACCGG ATCGTGCCCA CCACCCACGG 

8 5 951 CTGCGACCGC CGCGGGCGGC TCCACCCAGT AGTGCTTGTG CTGGAAGGCG 

8 6 001 TAGGTGGGGA GGTCGACGGT ACGGGGGGTG GGGTCGGCCG GGAACCAGCG 

8 6 051 CCGCCAGTCG ACGGGGGCGC CGGCGGTGAA GGCGTGGGCG GCCGCGCGGG 

8 6101 TGAGCTGGGT GGTGTCACCG TGGTCGCGAC GCAGGGTGGG GATGGTGACG 

8 6151 GCCGTCCCCG CAGCACCGGC CTGCTGCTCG ATGGTCTCCT GGATG C CGAG 

8 6201 GTTGAGGACG GGGTGGGGGC TGGCCTCGAT GAACAGGCGG TAGCCGTCGG 

8 6251 CCAGCAGCGC TTCGATGGTG TCGGCGAAGC GGACGGGCTG GCGGAGGTTG 

86301 GTGAC CCAGT AGGCGGTGTC TAGGGCGGTG GTGTCGTCGA GGCGCTCTGC 

86351 GGTGACCGTC GAGTAGAACG CCACGTCGGT GGTGGTCGGC TGGATGTCGG 

86401 CGAGCCGGTC GGTGAGGAGG TCGTGGAGCT GGTCGATCTG GGGACCGTGG 

86451 GAGG CGTACC TGACGTCGAT GACGCGGGCC CTGAGTCCCT GCGCCTCCGC 

86501 ATCGGCGACG ACGGCTGCCA CATGCTCCGG CGGGCCCGAA ATCACGGTCG 

8 6551 ACGACGGTCC GTTGACGGCC GCGACGACTA CGCCCGGCCG GTCGCCGATC 

866 01 AGCTCTGCGG CCTGCTCGGC ACCGGTGCTG AGCGAGGCCA TGTCGCCGTG 

86651 CCCTTGCAGC TGACGAAGCG CGTCGCTGCG TACGGCTACG ATCCGTGCCG 

8 6701 CATCCTCCAG TGACAGTGCC CCCGCCACAC ACGCGGCAGC CATCTCGCCC 

8 6751 TGCGAGTGCC CGATGACGGC AGCCGGGGTG ATGCCGTAAT CGGCCCACAC 



8 6801 CGCAGCCAGC GAGACCATCA 

86851 CCCGGGACAG CTCGCTCCCG 

86901 TCCACATGCG CCGACAGCGC 

8 6 951 GACGGGCGAC TCGTCAAGGA 

87001 CCTGCCCCGG AAACACCAAC 

8 7051 CCGGCCACCA CATCCGCCGA 

87101 GGCACCAGCC TGAGCCAAGT 

87151 ACAACGCGCG TGTCGTGGCC 

8 7201 TCCGCCAGCC CGGCCGCGAA 

87251 CGCGTCCGGC GTCCGCCCGG 

8 7301 CAGCCGCCAC GGGGCCCGGC 

87351 ATCAGGTGCG CGTTCGTCCC 

87401 CCGCGTGCGC TCCGCCGGCC 

8 7451 TGCCCTGTTC CCACTGGACG 

8 7501 GTCGGGAGGA GACCGTTGCG 

87551 GACACCGGCG GCGGCCTGCG 

8 7601 GCCAGAGCGG ACGGTCCTCC 

87651 GCCTGCGCCT CGATGGGGTC 

87701 TACGGCGTCG ATGTCCTCGG 

87751 GGATGACGCG TTCCTGGGAG 

87801 GTACCGTCCT GGTTGGTGGC 

87851 GCCGCGGCGC CGCGCTTCGG 

87901 CCTCGCCCCA GCCGGTGCCG 

87951 CCGTCGGGCG CGAGCCCCCG 

88 001 CGTCGCCATC AC CGTCGAAC 

8 8051 GCCGCAGCGC CTGACTTGCC 

88101 GCCGTGTCGA CGGTGACCGC 

88151 CCGGCCCGAC ACCACACTGC 
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CCGCCCACAG CACGGGCTGC ACGACCTCGA 
TCCCCGCGCA AGACATCACT C AG CG AC C AG 
CTGCTCACAC TCCGCGATCC GCGCCGCGAA 
GCTGGGCGCC CATGCCCACC CACTGCGACC 
ACCGGCCCAG GACCCACATC ACCGGCCACC 
CGCCTCACCC GCGGCCAATG CCTCCAGGCT 
CCCGCCCCAC GACCACGGCC CGCTGATCGA 
AGCGAC C AG C CCACCTCGGA G AC CGACG C A 
CTCGCCCAGC CGCCGCGCCT GTTCACGCAA 
ACACCACCCA CGGCACGACC CCACCCGGCT 
GCGTCCTCTT CCGGCGGCGC CTCCTCCAGA 
GGAGATCCCG AACGCCGAGA TGCCTGCCCG 
AGTCCACGGG CTCGGAG AG C AGTCGTACGC 
TGCGGTGACG GGGCGTCGAT GTGCAGGGAG 
CATCGCCATG ACCATCTTGA TGACGCCCGC 
TGTGGCCGAT GTTGGATTTC ACCGAGCCGA 
GGGCGCCCCT GGCCGTACGT GGCGATCAGG 
GCCGAGCGTG GTGCCGGTGC CGTGCGCCTC 
CGGAGAGGCG GGCGTTGGCG AGGGCGGCGC 
GGGCCGTTGG GGGCGGCGAG CCCGTTGCTC 
CGAACCCCGT ATCACCGCAA GGACCTTGTG 
AGAGCAGCTC CAGCGCCACC ACCCCGGCGC 
TCGGCGGCGG CCGCGAACGG CTTGCACCGC 
CTGCCGGGAG AACTCGGTGA ACGAACCCGG 
CGCCCGCCAG CGCGAGCGAG CACTCGCCCT 
AGATGGATCG CCACCAGGGA CGACGAG C AC 
GGG AC CTTCG AGCCCCACCG TGTAGGAGAT 
CGAGGTTGCC GGTGCCGATG TACCCCTCGA 



8 8201 CGTCGCTGGC CGTCTGGCTG 

88251 CCGGTGAAGA CGCCGGTGTC 

8 83 01 CGCGTGCTCG ATCGCCTCCC 

883 51 GATCCATCGC CGTGGCCTCG 

88401 AAATGGCCGG CGTCGTACAG 

88451 CGGATGCTCC GGATCCGGGT 

88501 CGTCGGGGAA CCCCGCGACC 

88 551 AGGTCCTCCG CCGACCGGGC 

88601 GATGGCGACC GGCTCGGTCG 

88651 GGCGCAGCTC CGCGGTGACC 

8 8 701 TTCGACATCT GACCCAGGCT 

88751 GAGGTTGGGG GAGGGCAAGG 

888 01 AGGACTGATC GCTGCTCAGG 

88 8 51 AGATGTCGTC CGCGCTCGCC 

88 9 01 CCTTCCGGCC CGTCCTGCGC 

88 951 CCGGCCCACG ATGCGGCGCC 

8 9001 ACGCCGTGTC CCACTTGTCG 

8 9051 ACCTCGGCGC CGTCGCCGAG 

8 9101 GGGCGTGGGA TGGTCGAAGA 

8 9151 TGGTGTTGAG CTGGTTGCGC 

8 9201 AGCTCCTGGA ACGGCTTCGC 

8 9251 CGTCGCGGCC GCGTATGTCC 

893 01 GTGATGCGGG CGTCTTCGCC 

8 9351 GCGGTCCCCG TCTGCTCGGC 

8 9401 GAACGGACTG GGCCGCTGCG 

89451 CGAAGTTCGC CACGGTCAGC 

8 9501 AGCGCCTTGA CGCACAGATC 

8 9551 GAAGTACGTC AGTGACTCCG 
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ATCAGCGTCA GGTAGTCGTG GGCGCTCACT 
GCTGCCCTTC AGCGCGTGCG GGTTCATGCC 
ACGCGGTCTC CAGGAGCAGC CGCTGCTGCG 
CGCGGGCTGA TGCCGAAGAA CTCGGCGTCG 
GAAGGCGCCG TCCCGCACAT AGCTGGTGGC 
GATACAG CGA CTCCAGGTCC CAGCCCCGGT 
GCGTCACCCC CGTCGCGCAC GAGTTCCCAG 
GCCGCCCGGG TAGCGGCAGG CCATGCCGAC 
ATTCCTTGTC GTGGAGCCGT TGCCGGGCCT 
CACTTGAGGT GATCGAGAAG CTTCTCCTCG 
CCTTGGCGCT ACGTGGTGAT CGGGGCGTAT 
GGGCCGGTGT GGCCGGGGCT CATCGCGCTC 
ACTTCCCGAA CTCACTGGAG ATGAGGTCGA 
GCCTCCAGAT CGGCATGGGC CGAATCAGTG 
CGGACTCCAC TTCGACACAA GGACCTGCAG 
GGGCCGCCTC GTCCACCTCG GCCGCTCCGA 
AGCGCCGCGA GCACGTCGCC CTCACCTGCG 
CTGTCCGCGC AAGTGCGTGG CGAGGGCCTC 
TCACGGTGGC GGGGAGCGAG AGTCCGGTCG 
AGCTGGACCG CGGTGAG CGA GTCGAAGCCC 
GGCGGGAATG TCCTCCACCG TGCGGCCGAG 
GGACCTGCTG GACCAGGAAG CCGAGCCGCT 
AGCTCCTCGC GGAACGCGCT CGTCTCGGCG 
CTCCCGCTGG TTCTCCGGAA GGTCGTCGAG 
CGGTGAACGT CGGCGTGAAC TTCGCCCAGT 
GTGGCGTCGC CCGCGTCGAC CGCCTGGTGC 
CGGAGCGATC GGG AG C AG AC CGAAGCGCTT 
GGTCGGCGGA CATGCCCGCC TCGGCCCAGG 



89601 

89651 

89701 

89751 

89801 

89851 

89901 

89951 

90001 

90051 

90101 

90151 

90201 

90251 

90301 

90351 

90401 

90451 

90501 

90551 

90601 

90651 

90701 

90751 

90801 

90851 

90901 

90951 



GCCCCCAGGC 

GCGAGGGCGT 

ACTGCCCCAC 

CCAGGTCGCG 

CCCAGCACCT 

ATCGGTCATG 

GCATGTCGTC 

GCGGTGATGG 

CCGGGCGCCG 

GGGCGCCCTG 

GTCCCGCCGG 

GTGCACGGGA 

GGATTCCGGT 

AGAGCGG CGG 

GTCGACGAGT 

CGAGCCCCCA 

CCCGTGGACA 

GTTGTCGCCG 

GCACGGCGGG 

ACGATTCCGG 

CGCCCGGCCG 

CGCTCAACGC 

GCAGGGACGA 

GGACCGCTGT 

GGTCCTGGTG 

AGCGCCCGGA 

CAGCAGACCG 

CCGCACTCTG 



GATGGAGGTG 
CGAGGAAGTG 
ACGCCGGCGC 
CGTCAACTCG 
CGCCGAGGCG 
CCCGCCGCGT 
GATGAGGCCG 
TGACGGCGGT 
GGCGCGTCGG 
CCGGGCCAGC 
TGATCAGGGT 
TTCTGAATGC 
GGGGCGGACG 
CGAGGCGGGG 
CCGGCCCAGA 
GACGTGAGCC 
CGGCACCCTG 
AGGGCCTGGA 
GTGCTCGGGG 
CAAG AC CGGC 
GTCTCGACCG 
CTCGGCGGTG 
AC AGCAGC CA 
TTCCACGTGA 
CCGACGCCGC 
CCGCCTCCTC 
AGATCGAGCT 
CTCACCGCTG 
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GCGGGCAGGC 
GTTGGCCGCA 
CCGACGAGAA 
TGCAGGTTCC 
CGCGGTCGTC 
GGATGACGGC 
CTCAGTTGGC 
GCCGAGCCCG 
GCCCGCGACG 
CAACGGGCGA 
GGTGCCCCGA 
TTCCGACGGC 
GCGGTCTGGT 
GAGGGTGTGG 
GGCGCGGGTG 
TGGAGGGGGT 
CGTGACGGTG 
GGAGAG CGGT 
TGCGGCTCCT 
CGTGTCCACC 
GATGCAGCCG 
GCTCGTACGG 
GTCGCCGCCG 
CGCGGTACCG 
CAGCCCTTGA 
GCTGCCCTCC 
CCTCGACGGC 
ACGGCGCCCG 



CCTGGGCGCG 
CCATAGGCGC 
CATCACGAAC 
AGGCCGCGTC 
AG ATCAC CGA 
TGTGAGGGGA 
GGGGATCGCT 
TCGAGCTCGG 
GCTGGTGAGG 
GGACGGCACC 
GGCCGCCAGG 
GTGCGTGAGG 
CCTCGTCGTC 
CGGTCGATAC 
TTCGAGGGCT 
GGGTGAGTGG 
TGCAGGGGTG 
CGTCGCGGCG 
CGTCCAGCGC 
GCGGCCAGCT 
GACGGCGGCC 
CGGGGTGCTC 
AGTTCCGGTG 
CCAGGAGTCG 
GCACCGGCAA 
TCCGACCCCA 
GTGCCACAGC 
AGGCGGACGC 



GCGGTGCTCG 
CCTGCTGGCC 
GCCGAGAGGT 
GGACTTCGAC 
TCGCGGTCAG 
TGCTCGGCCG 
GACGTCGCAG 
CGGCGAGTTC 
TGAAGACGGG 
GATGCCGCCG 
TGGCCTCGCT 
CGCCGGTGGT 
CTGGGGGAGG 
GAGCGGGGAG 
GCGACGCGGC 
GTCGGTGGCG 
CGGTCGTGCC 
AGCCCCGCGG 
CAGCAGATTG 
CCTGACGTCC 
GCCCCGTGCT 
CGCCTTGTCG 
CGGGCCCGTC 
ATGGTCGCCT 
CGCGGGCTCC 
GCGTCTCGGC 
TGGGCCTCGG 
GGAACGTTCG 



91001 AGC CAGTAGT GCTGGTGTTG 

91051 GGGGGTGGGG TCGGCCGGGA 

91101 CGGTGAAGGC GTGGGCGGCG 

91151 TCGCGGCGGA GGGTGGGGAC 

912 01 GGTCTCCTCC ATGCCCAGGC 
91251 ACAGGCGGTA GCCGTCCGCG 

913 01 ACCGGCTGGC GGAGGTTGGT 
91351 GTCCGTCAGA CGCTCGGCGG 

914 01 TCGCGGGCCG GATGTCAGCC 
914 51 TCGATCTGGG GGCCATGCGA 
91501 CAGACCACGT GCCTCCGCAT 
91551 GCCCTGAAAT CACCGTAGAC 
91601 CCCGGCCGGT CACCGATCAG 
91651 CGAGGCCATG TCACCGTGCC 

917 01 CGGCTACGAT GCGCGCCGCA 
91751 GCGGCAGCCA TCTCACCCTG 

918 01 CCCGTAATCA GCCCACACCG 
91851 CCGGCTGCAC GACCTCGACC 
91901 ACCGCACTCA GCGACCAGTC 
91951 CGCAATCCGC GCCGCGAAGA 
92001 TGCCCACCCA CTGCGACCCC 
92051 CCCACATCAC CAGCAACCCC 
92101 AGCCAACGCC CCCAGGCCAG 
92151 CCACAGCCCG GTGCTCGAAC 
92201 ACATCAGCCG CCGACGCATC 
92251 CCGCGCCTGT GCACGCAGCG 
92301 GAACGACCCC ACCCGGCTCC 
923 51 TCCGGTGGTG CCTCCTCCAG 
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GAAGGCGTAG GTGGGGAGGT CGACGGTGCG 
ACCAGCGCCG CCAGTCGACG GGGGCGCCGG 
GCACGGGTGA GCTGGGTGGT GTCGCCGTGG 
GACGGTGGCG GGGATGTCCG CCTGCTCGAT 
CCAGCACGGG GTGGGCGCTG GCCTCGATGA 
AGAAGGGCTT CGATGGTGTC GGCGAACCGG 
CACCCAGTAA TCCGTATCCA GGGCTGTGGT 
TGACCGTCGA ATAGAAGGCC ACGTCCGTGT 
AGGCGTTCGG TCAGCAGATC GTGGAGCTGG 
GGCGTACCCG ACGTCGATGA CACGGGCGCG 
CGGCGACCAC GG C AG C C AC A TGCTCCGGCG 
GACGGCCCAT TGACCGCCGC GACGACCACG 
CTCAGCGGCC TGCTCGGCAC CGGTGCTCAG 
CTTGCAGCCG ACGAAGCGCG TCACTGCGTA 
TCCTCCAGCG ACAGCGCCCC CGCGACGCAC 
CGAGTGCCCG ATCACAG CAG CCGGAGTGAC 
CAGCCAGCGA GACCATCACC GCCCACAACA 
CGGGACAGCT CACTCCCATC CCCGCGCAAC 
CACATACGCC GACAGCGCCC GCTCACACTC 
CGGGGGACTC GTCCAGCAGC TGGGCACCCA 
TGCCCCGGAA ACACCAACAC CGGCCCAGGA 
GGCCACCACA CCCGCCGAAG CCTCACCCGC 
CCGTCAACGC ATCGCGGTCA CGCCCCACCA 
ACCGACCGGG TCGTGGTCAA CGACCAGCCC 
CGCCGGCCCG GCCGCGAACT CGCCCAGCCG 
CGTCCGGCGT CCGCCCGGAC ACCACCCACG 
TCGGCCACGG AGCCCGGCAC GTCCTCCTCC 
GATCAGATGC GCGTTCGTCC CCGAGAAGCC 
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924 01 GAACGAGGAC ACCCCCGCCC GGCGCGGGCG CTCGCCCCGG GGCCACTTCA 

92451 CCGGGTCGGT GAGCAGGCGC AGCCCGCTGC CGTCCCACTC CACGTGGGGC 

92 501 GAGGGGGCGT CGACGTGCAG GATGGCGGGC AGCAGGTCGT GCCGCAGGGC 

92 551 C AGGAC CATC TTGATGACAC CGGCCACACC GGCGGCGATC TGCGTGTGGC 

92601 CGATGTTGGA CTTCACCGCT CCCACCCACA GCGGCCGGTC CTCCGGCCGT 

926 51 TCCCGGCCGT AGGCGGAGAT GAGAGCCCCG GCCTCGATGG GGTCGCCGAG 

92701 CGTGGTGCCG GTGCCGTGCG CCTCCACGGC GTCGATGTCC TCGGGGGCGA 

92751 GGCGGGCGTT GGCGAGGGCG GCGCGGATGA CGCGTTCCTG GGCGGGGCCG 

92 8 01 TTGGGGGCGG TCAGGCCATT GCTCGCGCCG TCCTGGTTGA TCGCCGAACC 

92 851 CCGGATCACC GCGAGGACCT TGTGGCCCTT CCTGCGGGCG TCGGAGAGAC 

92 901 GCTCAAGGAG AACCACCCCC GTACCCTCCG CCATGCCCAT GCCGTCGCTG 

92 951 CTCGCCGAGA ACGGCTTGCA CCGTCCGTCG GGGGCCAGGC CGCGCAGTTC 

93 001 GCTGAAGCCG ATCAGCGGGG CGGGCGACGA CATCACGTAC GTGCCGCCCG 
9 3 051 CCAGCGCCAG CGAGCACTCC TGTGTGCGCA GGGCCTGGGT GGCGAGGTGA 
9 3101 AGGGAGACCA GCG AC GAGGA GCACGCCGTG TCGACCGTCA CCGCGGGGCC 
9 3151 TTCGAGGCCC AGGGTGTAGG CGACGCGGCC GGAGGTGACG CTGCCGGAGT 
93201 TGCCGATGGT GAAGTATCCG GCGGTGCCCT CGGGGACCTC GGACGCGCCG 
93251 AGGGCGTAGT CGAGTCCGTC ACAGC CGATG AAGGTG CTGG TGTCGCTGGA 
93301 GCGGAGGCTG AGGGGGTCGA TGCCGGCCCG TTCGATCGCC TCCCACGCCG 
93351 TCTCCAGGGC GAGCCGCTGC TGCGGCGCCA TGGCCGCGGC CTCGGTGGGT 
93401 CCGATGCCGA AGAAGGTGGG GTCGAAGTCA CCGGCGTCGT AGACGAAGCC 
934 51 GCCTTCCCGG ACGTAACTGG TGCCGGTGCT CTCGGGGTCC GGGTCGTAGA 
93 501 GGGAATCGAG GTCCCAGTTG CGGTTGCCGG GCAGGGGCGC GACGGCGTCG 
93 551 CCGCCGGTGG AGAC CAGCTC CCAGAACTCT TCGGGAGACC GGACTCCGCC 
93 601 GGG CAGCCGG CAGGCCATGC CGATGACCGC GACCGGTTCG TGGCCCGCCG 
936 51 ACTCGACGTC CTG CAGCCGG CGTTCCGTCT GACGCAGGTC CGCGGTGACA 
93701 CGCTTGAGGT ATTCCAGAAG TTTCTCTTCG GTGTGCGCCA TCCCGGTGAC 
93751 AACCGCCCCT CTCCGCGAGA ACAGACCGCA GACTCGTCGA CGGCGCTAAA 
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93801 


GCCCTCCTAA 


TACTCGGCTG 


TGTACCGCTC 


GCTGCCACGG 


GTGTCCGCAC 


93851 


TGGTCGGAGG 


CTCCGGCCCA 


GGGAACAGGG 


GCTTTCTTAG 


GGGCGCTTAA 


93901 


GCGGTGCCTG 


CCAGGGTGTG 


CCGGTGTCAG 


GCCGTCACGC 


CCTGATCAGC 


93951 


GGCGTCGCCC 


GTGCCGTGCC 


CGTGCGGTCG 


GTGGGCCTGA 


CCGTCGGTCC 


94001 


GGACAACGCG 


AAGCGAGGCA 


TCGTGCCCAT 


CACGGATAGC 


AAGCCGGCCG 


94051 


CCACATTCCC 


CGACCTGGTC 


GACCCGTCGT 


TCTGGGCGCG 


GCCGCACGCG 


94101 


GAACGCGTGG 


CGCTGTTCGA 


GGAGATGCGC 


GGGCTGCCGC 


GGCCGGCGTT 


94151 


CATCCGGCAG 


AACATGCCCG 


GCGTGCCCTG 


GACGTTCGGC 


TACCACGCGC 


94201 


TGGTCAAGTA 


CGCGGACATC 


GTGGAGGTGA 


GCCGCCGCCC 


GCAGGACTTC 


94251 


TCCTCGAACG 


GCGCGACCAC 


CATCATCGGT 


CTGCCGCCCG 


AGCTGGACGA 


94301 


GTACTACGGC 


TCGATGATCA 


ACATGGACAA 


CCCGGAACAC 


TCGCGGCTGC 


94351 


GGCGCATCGT 


CTCGCGTTCC 


TTCGGCCGCA 


ACATGATCCC 


CGAGTTCGAG 


94401 


GCCGTGGCGA 


CCCGCACCGC 


CCGCCGCATC 


ATCGACGAGC 


TCATCGCGCG 


94451 


GGGACCCGGC 


GACTTCATCA 


GGCCCGTCGC 


CGCGGAGATG 


CCCATCGCCG 


94501 


TGCTCAGCGA 


CATGATGGGC 


ATCCCGGCGG 


AGGACCACGA 


CTTCCTCTTC 


94551 


GACCGGTCCA 


ACACGATCGT 


CGGCCCCCTC 


GACCCGGACT 


ACGTGCCGGA 


94601 


CCGGGCGGAC 


TCCGAACGCG 


CGGTGATCGA 


GGCGTCACGC 


GAACTCGGCG 


94651 


ACTACATCGC 


TGGCCTTCGT 


GCGGAACGGC 


TCGCCGCCCC 


CGGCAACGAC 


94701 


CTCATCACCA 


AGCTCGTGCA 


AGTCCAGGCG 


GACGGCGAGC 


AGTTGACGCG 


94751 


GCAGGAACTC 


GTCTCCTTCT 


TCATCCTGCT 


CGTCATCGCC 


GGGATGGAGA 


94801 


CCACCCGCAA 


CGCCATCTCG 


CACGCGCTGG 


TACTGCTGAC 


CGAGCATCCC 


94851 


GAGCAGAAGC 


AGCTGCTGCT 


CTCGGACTTC 


GACACGCACG 


CGCCGAACGC 


94901 


GGTCGAGGAG 


ATCCTCAGGG 


TCTCCACGCC 


CATCAACTGG 


ATGCGGCGCG 


94951 


TCGCCACCCG 


CGACTGCGAC 


ATGAACGGCC 


ACAGGTTCCG 


CAGGGGCGAC 


95001 


CGGATCTTCC 


TGTTCTACTG 


GTCGGGCAAC 


CGGGACGAAT 


CCGTCTTCCC 


95051 


TGACCCGTAC 


CGGTTCGACA 


TCACGCGCGG 


GACGAACGCG 


CACGTCACGT 


95101 


TCGGCGCGGT 


GGGCCCGCAC 


GTCTGCCTCG 


GGGCCCACCT 


CGCCCGTATG 


95151 


GAGATCACCG 


TCCTGTACCG 


GGAGCTGCTC 


GCGGCGCTGC 


CCCAGATCCA 



95201 TGCCGTGGGG CAGCCCCGCA 

95251 AGCACCTGCA CTGCGCCTTC 

953 01 CGCTCACGAC GCTCCGATCA 

95351 AGGGCCAAGG GCGGTGGGGA 

9 5401 CCGCTGCAGC TCCTGGAGCG 

95451 TGAGCGTCTT GCGCACCGAC 

95501 TCGGAGCGGT AGAGCGCCAG 

95551 CGGGTTCTCC GCGGTGAGGG 

956 01 GGCCGAGCTG GAG CTGG C AC 

95651 ATCTCGGTCA GCCAGGTCGA 

9 5701 GGGACCGTTC CCGCCCTGCC 

95751 CCTGCCCGAA AC AGGAGGC C 

958 01 GACCGCCCCA CGTCCACCAG 

958 51 GTCGTCCCGC TTGTGCAGGA 

95 901 TGCCCGCCGA ACCGGGCACC 
95951 ACATGCAGTC CCGCCGTGGC 
96001 CCGCATCAGC TGCTCCAGGG 

96 051 CGGTGAGGAC GATCTCCACC 
96101 CCGTCCTTGA TGCGGAGCGG 
96151 CCCCTTGCTG TCGCACGCTG 
962 01 GACTTC CGTG ACGCCCTGCT 
962 51 CGAAGACCTT GAGGTCGAAG 
96301 ACCTCGTCCA GCGGCGCCTG 
96351 GGGGCAGGAC AGCGGCGGCG 
964 01 GGTAGTCGTT GCGGACGACC 
964 51 TCCAGGAACC GCGGGTCAGT 
96 501 CTCGGTGTCG TCGAGGAGGT 
96551 CGCGGCGTCC CGAGACGAAC 



GGCTGGACTC CAGCTTCATC GAAGGGATCA 
TG AG C AC AT A CGCTTCCCTC TGCGCATGTG 
GCGACTGCCA ACGACTGTCA GCGACCGGAC 
CATCAGGTGC ATGTCACCCG CGAGTATGGC 
GGCGCCCGGG TTCGAGCCCC AGCTCGTCGT 
TGGTACACCT TCAGCGCGTC CGCCTGCCGC 
CATCAGCTGG CGGTAGAACG CCTCGCACAT 
CGTAC AG CAT GCCCACGGCC TCGCGGTGCC 
TCGACGAGCA TCTCCTGACA CTCCAGGCGG 
GAAGCCGTCG ATGATCGGGC CGTTGGTGCC 
CGAGGATCGG GCCGCGCCAC AGCGCGAGCG 
GCCTCGTCGA ACCGCTTCTC CCTGAGCAAC 
TTCGGGGAAG ATCTGGGCAT CGATCTGGTC 
CGTACCCCGG CGCACGGGTC TCGACGGGGT 
TTGAGGAACT TG.CGGAGCTG GGAGATGTAC 
GCGCCGCGGC AGGTCCTCGC CCCAGATCTC 
AGACCACCCG GTCGGCGCGG ATGAGGAGCA 
TTCTGGGCGT TGATGGTGGC GTAGTCGTTT 
GCCCAGCATT TCGTATCTCA CCGAGCGTTC 
CTGCGCACTG TCGGCCAGGG CCTTGGAGAT 
GGTGCGTGTT CAGATAGAAG TGGCCCCCGG 
GGGCCTTCCG TGTGCTGCTG CCAGGCCTCG 
CGGGTCCCGG TCGCCCACCA GGGCGGTGAT 
ACGGGTTCCA CCGGTACAGC TCGACCGCCC 
GGGATGATCT CCGCGAGCAG TTCCTCGTCG 
GCCACCGGCC CGGCGCAGCT CGGCGGCCAA 
GTACGGTGCC GCG CCGGAAG CGGGACGGCG 
AGCCGGCAGG GCTGCTTCCC CGTGCGCTCG 



966 01 CGGAGCCGCT GGGCGACTTC GTAGGCGAGG ACGGCGCCCA TGCTGTGGCC 

966 51 GAAGAACGCC AACGGGCGGT CGTCGAACGG GCCGAGCGCA TCGGTGATGA 

96701 GGTCGGCGAG TTCCCCGATG TCGTCCAGGA GCCGCTCTCT GCGGCGGTCC 

96751 TGTCGCCCGG GGTACTGCAC CGCGAGGACC TCGCTGTCGG TCGGGAGAGT 

968 01 GGGGGATTGC GCAAGGGGGT GGTAGTAGGA GGCCGAGCCG CCCGCGTGGG 

96851 GGAAGCAGAC CAGGCGAACG ACGGCTTCCG GTCGGGGCCG GAAGCGACGT 

96 901 ATCCAAGGGT CCGACATATC GGGTGGGGGG AAGG C AG AC A AGATCTTTCC 

96 951 CTTCGCCAGG AACGCTGACA ACGGTGTGTC GCCACATCAC ATAGCCGCTC 

970 01 CTGATCATGC GCAGCTCAAA GTTTAAACGG CAACGTCGCT AACGGGGGAG 

97051 CAGGGCGGAA TCAGACATTC CCCATCCTTT ATTCCGCGAT TCTTACGTGA 

97101 TCGAATCCCG GCGGCCAAGA TGGAGTAAAT TTCAATATGA ATG CTTAACG 

97151 CCGCACAGCT TGTACGGCGG GCCGCCCGGG CGGTGACTGG CGTCCCTGCC 

972 01 AGCCGTGATG GCCTGACGAG GCCTCCGGGA TCCATCCCCC GCCCGCTGTC 

97251 GCCGAGTTCT TTGCGGGATT ATTACGTTGC ATTGGTTTGC TTCGTGGCCC 

97301 GGGCCGTTGG CCTGCGCTAT TTGGCAGCCT TCCGTCATGG GTGGTAAAAG 

97351 ATCGCCTTTC CCCTCTGGGG TGCCGGTCGA GCTGGCCTCG ACCGCGATTG 

97401 TGGCTTGTTG TTTTCTTGTG GCGCCGCGTG TGAAACAGCG GCAGTTGGCC 

97451 ACTCGCTCTG ACAGGCTCCG GGGACGGGGT TGTCACCTTT TGGGGTGACT 

97501 GGCCTCGTTC AAGGCGTCCT GGCCCGTGGT GCATCCGCGA TCGTCGTGCC 

97551 ATGGGTGAAG TGGGAAGGAG CACAGAACGA TGAGCGAGAG CATGGCGTGG 

97601 CTGACGCGGG ACGTCCGCAA GGCCCGCAAG GAGGGCAGTG CGGGGACCGC 

97651 GCGGCGCCGA GCCGACCGGC TGGCGGACCT GGTCGCCCAC GCCCGCTCGG 

97701 CGTCGCCGTA CTACCGGGAG CTCTACCACG GCCTGCCCGA GCGGATCGAG 

9 7751 GACCCGACGC TGCTGCCGGT GACGGACAAG AAGCAGCTGA TGGAC C ACTT 

97801 CGACGACTGG CCGACGGACC GCGACATCAC CTTCGAGAAG GTCCGCGCGT 

97851 TCACCGACGA CCCCGAGCTG ATCGGGCGGC GCTTCCTCGG CCGCTATCTG 

97901 GTGGCCACCA CGTCGGGCAC C AG CGGCAGG CGCGGCCTGT TCGTGCTCGA 

97951 CGACCGGTAC ATGAACGTGT CCTCCGCCGT CTCCTCCCGG GTGCTCGCCT 



98 001 CCTGGCTCGG CCCCCTCGGC 

98 051 TTCGCCCAAC TCGTCGCCAC 

98101 CTCCCGCCTG CGCCAGGACG 

98151 TCTCTGTGCA CGAGCCGATG 

982 01 CGGCCCGCGT TCGTCATCGG 

98251 CGAACAGGAA GCGGGCCGGC 

98 3 01 CGGGCGAGAC GATGACCGAG 

98 3 51 GGCGCCAAGG TGCGCACGAT 

98 4 01 CCACGGCTGC GCCGAGGGCT 

98451 TCGAAC CGGT CGACGCCGAC 

985 01 CACACCACCC TGATCAGCAA 
98 551 CTACGAC CTG GGCGACAGCG 

986 01 GCACCCCCTC GCCCGCGATC 
98 6 51 ACCTTCCCCT CGGGCCGGGG 
9 8701 CAGCAGCCTC TTCGACCGCA 
98751 AGACCGCGCC GTCGACCCTG 
98 8 01 GACGCGGACC ACGTGTGGCA 
988 51 CGCCGACAAC AAGCTCGACA 
98 901 CGCGGCAGGC ATCCGGCGGC 
98 951 TGAACGCTCG CCGACTAGCC 
990 01 TACGGGCGCA GCGGAGGCTC 
99051 CAGCTCGATC GGGAAGTTCA 
9 9101 GGCTGTTGAG CGGCATGACC 
99151 CTCGTGTCGG CGGGACCGGG 
99201 GCTGACCTGG T CGACGACC A 
9 9251 GACAGATGTC CTGGACGCCG 
99301 TCCCCGTCGG CCACCACGGA 
99351 CGACGGTCCT GCGGGCATGG 
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ATCGCCCGGG CCGTCGTCCA CGGCGGCCGC 
CGAGGGACAT TACGTCGGCT TCGCCGGATA 
GCGGAGCGCG CAGCAAGCTC GTCCGCGCCT 
TCACGTCTGG TCGCCGAACT CAACGAGTAC 
CTACGCCAGT ACGATCATGC TGCTCACCGC 
TGCACATCGA CCCGGTGCTG GTCGAGCCCG 
AGCGACACCG ACCGCATCGC TGCGGCGTTC 
GTACAGCGCG ACCGAGTGCA CCTACCTCAG 
GGTACCACGT CAACGACGAC TGGGCCGTGC 
CACCGGCCCA CCCCGCCGGG GGAGTTCTCG 
CCTCGCCAAC CGCGTCCAGC CGTTCCTCCG 
TCATGCTCCG CCCCGACCCC TGCCCCTGCG 
CGGGTCCAGG GCAGGTCGGG CGACATCCTC 
CGACGACGTC AGCCTCGCCC CGCTCGCCTT 
TGCCCGGAGT CGAGCTCTTC CAGATCGAGC 
CGCGTCCGCG TGGTCCAGGC GCCCGGCGCC 
GCGGGCCCAC GACGGGCTGA CCCACCTCCT 
ACGTAACCGT CGAACGGGGC GAGGAGCCGC 
AAGTAC CGGA CGATCATCCC GCTCGCCGCC 
GCGCGCCGCC TGAGCTGCTC TCACCGCGCG 
CTCGTCGACC CACGG CTGGC TGTGGATCAG 
GCAGGCCGGG CAGGGCGTCG ACGGCCTCCT 
GGCTTGGCGC AGTGCGCGCG GTCGATGCGG 
GTGCTCGATC GCATCGGCGA CCAGGTCGTA 
TGGCGATGTG GGTCGGCCAC GGCCGACCCG 
ATGCGGTGGG CGCCCGGCAG CGACGGCGCC 
CTCGTCGGCG TATGAATAGA TCGTGGTGTA 
GCGTGCCGTC GGCGCCCAGA GCCTTCGACC 
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99401 


AGTTCGAGTC 


GCGGGCGAAC 


TGCAGGACCG 


ACGCCGGGCA 


. GCCCGCCACC 


99451 


TCGGCGATCG 


GGCGGCAGGG 


CGAGGCCAGC 


CGGGTCCCCT 


GGAACGGGGA 


99501 


GCCCAGGGTC 


AC C ATGTCGT 


CGACCTTCCC 


CGGCAGGTCC 


GGCCAGAAGC 


99551 


GCAGGGCCCA 


CGCCGTGAGG 


AGGCCGCCCT 


GGCTGTGCCC 


GACGAGATCG 


99601 


ACCTTCCGGC 


CGGTGGCCTC 


CTGGATCGCG 


CGGGTCGCGT 


ACACCACGTA 


99651 


CTCGACGGAC 


TCCTGCATGT 


CACGGAGCCC 


GCGACCGGGA 


GAATCCACCC 


99701 


AACAGGACTG 


GTAGCCCTTC 


TTCTTCAACT 


CGGCCATGTA 


GTTCCAGGCG 


99751 


TAGTTCTCCT 


CGCCCTTGAG 


GCCGGTCCCG 


GGCACGAAGA 


GGACGGTCGG 


99801 


CTTGTCACCG 


GCGTCACGCA 


GGTCCCCCAG 


CTCCGTCCCG 


CAGTGCAGCG 


99851 


CCTTGGCGAG 


CTCGGCCGCC 


GGTATCTCCA 


ACGGGGGAGA 


GGAAACATCC 


99901 


GCCGCCGAAG 


CGGCGGAGGC 


CGGAAGCACG 


GTGGCGGCCA 


GCACGGCCGC 


99951 


CACGAGTCCG 


CCGAGCCATG 


AGGACAAGCG 


C ACGGTG AC C 


TCCACAGGAA 


100001 


CCTTCACGAG 


TGAGCGGAAA 


CTCCCTCCGG 


AGGG AG C AC C 


TCATCGTGCG 


100051 


GCGGCGCCAC 


AGTAGCCGTC 


AACTGCCCCA 


CGGGGCTGAG 


TAGTTGACAG 


100101 


TTGGCCGGGC 


TCGGCCGGCG 


AAGCGCCCGG 


GCCCCGCCGC 


CCCGCGCCGT 


100151 


GCGCGAGGGG 


TCCGTGACCT 


GGGTGGACGG 


TCCGGTTGGA 


CATCCCGGGG 


100201 


GAGCCTCTGG 


CATGGTCGCC 


CGTCCGTCCC 


CCTCAAGAAC 


CGAAGGGAGC 


100251 


GTCACGATCA 


CGATGATCGA 


AGTCAGCACG 


CGCAGCATGA 


AGGAAG CGGC 


100301 


TGCCGCCGAG 


CAGCTCCGCG 


CGGAGACCAC 


GACACTGGAC 


ATTCCAAAGG 


100351 


GTTTCGACCT 


GTGGACGGCC 


GACGAGATCG 


CGGAGTGGCT 


CGACGGCGTC 


100401 


GAGGACGACC 


CGGCAGTCTC 


CGACGCCGAC 


TTCTACGCGG 


CCCAGCAGCG 


100451 


GTGCGACGGG 


TCCTCGGCAC 


CGAGGGCACC 


TGACCCGCCG 


GCGGCCCTGC 


100501 


GCGGCCCTAC 


GTGTGCAGCG 


CCCCGTCCTC 


CTCCACATGC 


CCCTCCGGCT 


100551 


CCAGCTGGAT 


CGTCGAGTGG 


GCCACGTCGA 


AGTGGCCCCC 


GACACACCGC 


100601 


TGAAGGCGCC 


CCAGGAGCTC 


CCCGTACCCG 


CTCGCGAGAG 


CCTCCTCCGT 


100651 


GACCACCACG 


TGCGCGGTGA 


GCACCGGCAT 


CCCCGAGGTG 


ACCGTCCAGC 


100701 


CGTGCAGATC 


GTGCACGGCG 


ACCACGCCCC 


GCTCCTCCAG 


CAGGTGCCGG 


100751 


CGCACCTCGC 


CGAGGTCGAC 


GTCCTGCGGG 


GTCGCCTCCA 


GCAGGACGTG 



1008 0 1 


CAAGGAGTCC 


CGCAGCAGGC 


1 CGTACGCGCG 


f CGGCACGATC AGCAGGCCGA 


100851 


TGACGATCGA 


. CGCGATCGGG 


TCGGCGGCCT 


' GCCACCCCGT GAGCAGGATG 


100901 


ACCAGGCCGC 


CCACGATCAC 


CGCGACCGAG 




LGLLLAGLAL 


100951 


CTCCAGGTAC 


GCGCCCCGCA 


GATTGAGGCT 


Li 1L1 LL 1 lb 


GLGILLCGCA 


101001 


G C AG C C AC AG 


GCCCACCAGG 


TTGGCGGCGA 




LGLGALCACG 


101051 


AACATCAGGC 


CGCCCTTCAC 


CTCCACCGGC 


1 LGL 1 GAALL 


GGCCGATCGC 


101101 


CGAC CACAGG 


ACCCAGGCGA 


AGATGACGAC 


L Abb AO L AG C 


GCGTTCAGGA 


101151 


CCGCGGAGAA 


GATCTCCACG 


CGGTAGAACC 


/*** "A 7V 7V rnf~* 

LAAAGGTGCG 


CCGCGGCGTC 


101201 


GGCGCCCGCT 


GGGCGAGGGT 


GATGGCACCG 


AGGG C C AG C G 


AGACGCCGAC 


101251 


CGCGTCGGTC 


AGGCTGTGCG 


CGGCGTCGGC 


G AG L AG L G C G 


AGGCTGCCGG 


101301 


ACAGGAGCGC 


GCCGACCACC 


TGGATGACGG 


1 OA 1 LGAGL C 


GCTGATGCCG 


101351 


ATGGTCCACA 


GCAGGCGCTT 


GCGGTACGTG 


LLbb 1 GAG AG 


TGCCGCCCGC 


101401 


CGCCCCGGCG 


GACGGACCGT 


GGTCGTGCCC 


LAI bLLLbLb 


AG TG G AC C AC 


101451 


GGCGGCGCGG 


CACCCGCCAC 


CGAGCGGCCG 


LLbb 1 LbbL I 


LAG i GLAGCC 


101501 


GGGCCTGGGT 


GGAGGTGTCG 


CGCTGGTGCG 


Lj 1 \2 L- L vj AG 


LGGLGGLGGL 


101551 


AGCTCGCCCT 


GCTGCACCCT 


GACCGTGCGC 


HLbuobbLbb 


GGALLLGGAT 


101601 


GCCCTCGGCG 


CGGTAACGCT 


GGTGCAGGCG 


PTTPATP A AP 


i LGTGCTTGA 


101651 


TGCGGTACTG 


GTCGCTGAAC 


TCGCCGACGC 


LbAbbAI GAL 


LGTG AAG CTG 


101701 


ATCCGCGAGT 


CGCCGAAGGT 


GTGGAAGCGG 


Al LGLLGLL I 


CGTGGTCGGG 


101751 


GACCGCGCCG 


GTGATCTCGG 


CCATCACCTC 


G 1 LLALLALL 


TCGGTCGTGA 


101801 


CCTTCTCGAC 


CTGCTCCAGG 


TCGCTGTCGT 


AGL1GALLLL 


GACCTGCACC 


101851 


ATGATCGACA 


GCTCCTGCTC 


GGGGCGGCTG 


1 AG 1 1 GG 1 LA 


fnpii 1 » ■ i«pmp /■"*? /—i 

TGTTGGTGCC 


101901 


GGCGAGCTTC 


GCGTTGGGGA 


TGATGACGAG 


GTTGTTGGAG 


AGCTGGCGGA 


101951 


CCGTGGTGTT 


GCGCCAGTTG 


ATGTCGACGA 


CGTAGCCCTC 


CTCCCCGCTG 


102001 


CTGAGCTGGA 


TGTAGTCGCC 


GGGCTGCACG 


GTCTTCGCGG 


CGAGGATGTG 


102051 


CACGCCCGCG 


AAGAGATTGG 


CGAGCGTGTC 


CTGCAGTGCG 


AGGGCGACCG 


102101 


CGAGACCTCC 


CACGCCGAGG 


GCGGTGAGCA 


GCGGTGCGAT 


GGAGATGCCG 


102151 


AGGGTCTGAA 


GGACGATGAG 


GAAGCCCATC 


G CG AG C AC CA 


CGACGCGGGT 



102201 
102251 
102301 
102351 
102401 
102451 
102501 
102551 
102601 
102651 
102701 
102751 
102801 
102851 
102901 
102951 
103001 
103051 
103101 
103151 
103201 
103251 
103301 
103351 
103401 
103451 
103501 
103551 



GATGTTCACG 
CCACGGCCTT 
GCCAGGATGA 
CGGCGTGAGC 
CCGCGCAGGG 
CTCCACCGGG 
GAG CAGCAGC 
TCCAGTCGTG 
GGGGGGAGTG 
TACCTGCTCG 
CCGGCGCGGA 
CCCACCTGGA 
GCGCTGCCCG 
CGGCGGACCG 
TGGCGTCCAT 
TGGGATCCGC 
TCCCTCCAGC 
GGCTGCAGGG 
CCGGCGCAGC 
CGTCACAGTG 
CCCCCGACTG 
ATGCTCGTCG 
CCGCTGTTTC 
GGCGCGGCGC 
CTGTACGAGA 
ATCCTCACCG 
GGGCCGCGAC 
TCGAGTGGCT 



AAGATGGTGG 
CACCAGGCCG 
GCAGCGCGGT 
GGCAGCGCGC 
CACGAGGGTG 
TTTTGCTCGC 
CCGGCGACGA 
CAGTGTGAGG 
CGCCTGTGTG 
ATTC CGGGGA 
CGTCATCCTG 
TCTTCACAGA 
TGGTAAAGCA 
GTCGAGCCGG 
GTGCAGACCC 
TGGAGGAGAT 
CATCCGGTCG 
TCTCGCCGCC 
AGGAGCTCGA 
AGACGGCCGG 
GCAGTCGCGC 
TCGGCGACGA 
GAG ACG C ACT 
GCGCTGGACG 
AGGACTTCCG 
AGTTCGAGCC 
CTGTTCGTGA 
GCGCCGCCAC 



- 115 
CCGATCCGGC 
GTGACGATCC 
CAG CGTCATG 
CCGCCGCGGC 
CGCAGGGCGT 
CCGTTCGCCG 
CGCCGGCGAC 
GCACGGGTCA 
GGGCGTATGT 
GTGCGGTCAC 
CCCCATCCGC 
CCGGCCACGT 
TTGAGTCAGG 
TCGATCTACG 
GGAATGTCAC 
CATCGTCGGG 
TGGCGTGCAA 
GGGTTCGAGT 
CCAGTTCATC 
CGGCCGTCGA 
GGCTTCTGCA 
GATCATCGAG 
CGTACCGCGA 
GCGGCGCCGC 
CCCTCCCGAG 
GGTGTTCGAC 
CGCGGAGCAA 
CTTCGGGCCG 



CACTCCGGAG 
GGGCCGCCGT 
GTGACGTTGC 
GGCGAGCCCG 
CGACGATGAC 
AGCCACCTCA 
GACCGCGATA 
TCAGTTCGCT 
GATGTGACGT 
GCCGGGACGA 
CCACGGCAGG 
CTGTCCATGC 
CGATTTGGCC 
TGAGCGGAGG 
CCGTCAATTC 
CGGCTGGAGG 
CATCCCGACC 
ATCCGCAGCG 
GCTCTCCTGC 
CCACAAGCAC 
ATTCCTGTCC 
ACCCCGATGG 
ACTCCTCAAG 
GCCCCCAGCT 
GAGGGCGAAC 
GCGGCGGATT 
CGTCGCCAAC 
GAGTACCGCG 



CGGGACTGTG 
GAGCGTGGCG 
GTCCGGTGCG 
GCGGTGATGG 
GTCGTCAC CG 
GAAGTGCGCG 
CCGGCCACGA 
CCCGTCGTAC 
CACCTTGTGA 
GAGCTCGGTT 
CGTGCATACC 
GCCGATGAGC 
ACTCGGCACT 
CGGTTG AG CA 
GCACAACGAG 
GCGCGACCAT 
TGGGCGGCAC 
GCTGATCGAG 
AATCCCTCGA 
CGCTTCGGGA 
GCGGGACAGC 
CGTGGCCGTG 
GACTACTTCC 
CACCGAGGCC 
GATGCGCTAC 
TCGTGCGGGC 
CTGCTGGGCA 
TGCCACGAGA 



Table 



positions_all 



gene 


function 


start 


end 


gdhA 


glutamate dehydrogenase (partial) 


1 038 


0 


dapA 


dihydrodipicolinate synthase 


2140 


1 220 


orf3 


putative transcriptional activator 


221 1 


31 52 


orf4 


hypothetical protein 


3264 


3680 


orf5 


hypothetical protein 


4307 


3684 


orf6 


hypothetical protein 


4570 


4758 


orf7 


hypothetical protein 


5058 


561 2 


acpX 


acyl carrier protein 


6010 


5693 


ksX 


ketoacyl synthase 


8531] 6045 


monCII 


probable epoxihydrolase/cyclase 


9542 


8643 


monE 


methyltransferase 


1 0426 


9596 


monT 


monensin resistance gene (ABC-transporter) 


10656 


12191 


monRII 


probable repressor 


1 2205 


12780 


monAlX 


thioesterase 


1 3829 


13023 


monAI 


polyketide synthase loading & module 1 


14121 


231 98 




KS-L 


14172 


15486 




AT-L malonate specific 


15777 


16880 




ACP-L 


17019 


17276 




KS1 


1 7358 


18626 




AT1 methylmalonate specific 


1 8960 


19976 




DH1 (potential) 


20019 


2051 9 




KR1 (inactive) 


21 636 


22241 




ACP1 


22536! 22793 


monAII 


polyketide synthase module 2 


23205| 29921 




KS2 


23307! 24569 




AT2 methylmalonate specific 


24891 


25913 




DH2 


25953 


26369 




ER2 


27600 


28463 




KR2 


28485 


29042 




ACP2 


29313 


29570 


monAI II 


polyketide synthase modules 3 & 4 


29974 


42372 




KS3 


30076 


31347 




AT3 malonate specific 


31798 


32838 




DH3 


32884 


33465 




KR3 


34692 


351 81 




ACP3 


35553 


3581 1 




KS4 


35899 


37170 




AT4 methylmalonate specific 


37489 


3851 1 




DH4 


38557 


38982 




ER4 


40123 


40986 




KR4 


41005 


41 562 




ACP4 


41848 


421 05 


monAIV 


polyketide synthase modules 5 & 6 


42448 


54564 




KS5 


42628 


43890 




AT5 ethylmalonate specific 


44221 


45243 




DH5 


45289 


45744 




KR5 


46785 


47337 




ACP5 


47593 


47850 




KS6 


47947 


49218 
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I 49579 


I 50601 




DH6 


50644 


] 51075 




ER6 


52222! 531 02 




KR6 


53101 


! 53661 




ACP6 


54052 


! 54306 


monAV 


polyketide synthase modules 7 & 8 


54614 


I 66934 


IKS7 


5471 6! 55978 


|AT7 methylmalonate specific 


56300; 57319 




DH7 


57358! 57802 


|KR7 


59048 


j 59608 




ACP7 


59867! 60124 


|KS8 


60185! 61453 


|AT8 malonate specific 


61808 


I 62839 




DH8 


62882 


< 63316 




ER8 


! 64577 


65437 




KR8 


65456! 66016 




ACP8 


66404! 66661 


monAVI 


polyketide synthase module 9 


66952! 72054 




KS9 


67075! 68340 




AT9 malonate specific 


68698! 69729 




KR9 (potential) 




7 1 262 


|ACP9 


/ 1 bob 


71 7oo 


1 1 \\Ji in 




70 AC i 


74993 


monCI 


FAD containing epoxidase 


7654 1 


75051 


monBII 


double bond isomerase 


/ oyou 


7ficqp 
/ DOOO 


monBI 


double bond isomerase 


77450 


77016 


monAVIII 


polyketide synthase modules 11 & 12 


88708 


77447 




KS11 


8861 2 


87344 




AT1 1 methylmalonate specific 


87022 


85993 




KR11 


851 1 1 


84562 




ACP11 


84292 


84035 




KS12 


83962 


82694 




AT12 methylmalonate specific 


82354 


81335 




DH12 (potential) delta 


81286 


80855 




ER12 (potential) 


7961 8 


78914 




KR12 


78895 


78337 




ACP12 


78070 


77812 


monAVII 


polyketide synthase module 10 


93741 


88816 




KS10 


93636: 


92368 




AT10 methylmalonate specific 


92040 


91021 




KR10 


901321 


89584 




ACP10 


89322) 


89068 


monD 


P450 oxygenase 


94081 


95273 


monRI 


probable activator 


96141 ! 


95338 


monAX 


thioesterase 


96941 | 


961 38 


orf29 


cell wall biosynthesis capK homologue 


97580I 


98953 


lipB 


lipase B 


99983| 


98991 


orf31 


ion pump 


101433 


100507 


orf32 


membrane structural protein 


102581 


101490 


amtA 


glycine amidinotransferase 


102924 


103450 
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GdhA, glutamate dehydrogenase (partial coding sequence) Length: 346 amino 
acids 

1 LTTRPDTKTA LSQKTALSQL LTEIEHRN PA QPEFHQAARE VLETLAPVIA 
51 ARPEYAEAGL IERLCEPERQ IVFRVPWQDD HGRVRVNRGF RVEFNSALGP 
101 YKGGLRFHPS VNLGVIKFLG FEQIFKNALT GLGIGGGKGG SDFDPRGRSD 
151 AEVMRFCQSF MTELYRHIGE HTDVPAGDIG VGGREIGYLF GQYRRITNRW 
201 EAGVLTGKGR NWGGSLIRPE ATGYGNVLFA AAMLRERGET LEGRTAWSG 
251 SGNVAIYTIQ KLAALGANAV TCSDSSGYW DEKGIDLDLL KQVKEVERAR 
301 VDTYAQRRGA SARFVPGRRV WEVPADIALP SATQNELDAD DATALI 

DapA, dihydrodopicolinate synthase Length: 307 amino acids 

1 MTLASSLEPT TEPLFNGLYV PLVTPFTDDL RLAPEALARL ADEALSAGAS 

51 GLVALGTTAE AATLTAEERE TVIRVCSAAC RAHGAPLIVG VGTNDTATAI 

101 TALRELAARG DVAAALVPAP PYIRPGEAGT LAHFAALAEH GGLPLVVYDI. 

151 PYRTGQTLGA GTITALGRLP EVVGIKHATG SIDPTTMELL DSPLPGFAVL 

201 GGDDIVLSPL VAAGAHGGIV ASANLRTADY AEMIALWRRG SAAPARALGA 

251 DLARLSAALF TEPNPTVIKG VLHAQNRIPS PAVRMPLLAA SADSVRRAAP 

301 LAASRK* 

ORF3, putative transcriptional activator protein Length: 314 amino acids 

1 MLDVRRLHLL RELDRRGTIA AVAEALTFTA SAVSQQLGVL EREAGVPLLE 

51 RSGRRWLTP AGRSLVAHAD AVLNRLEQAV AELAGARDGI GGPLRIGTFP 

101 SGGHTIVPGA LAELASRHPA LEPMVREIDS ARVSDGLRAG ELDVALVHDY 

151 DFVPATPDTT VDEVPLLEEP MYLVTHAADT ATDSGSGSTL AALLGPCAEV 

201 PWITARDGTT GHAMAVRACQ AAGFQPRIRH QVNDFRTVLA LVAAGQGAGF 

251 VPRMAAEPSP AGVVLTKLPL FRRSKVAFRA GGGAHPAIAA FVAAATTAVE 



01 RMAGSRGPAG GSE* 

ORF4, hypothetical protein Length: 139 amino acids 

1 MADDAYLFLL PDRHPRLGAA LAAVGALECT ETPAVHAWLQ AHEASVSSEQ 
51 VRILPADAET LIPKDAERLP VPLSEEEALK VEQECAPQTV TDMESELLAF 
101 RETTQDWQAL VHRALTAGIP AQRIARLTGL DPEEIGRL* 

ORF5, hypothetical protein Length: 208 amino acids 

1 LAVAACAAW LPIDAWRIS AADVGVLVFF AYLLPYLAIT MTVFVSVAPE 
51 QVRSWARREA RGTFLQRYVL GTAPGPGGSL FIAAAALWA VLWLPGHLST 
101 TFSALPRTLV ALALWAAWI CWVAFAVTF QADNLVENER ALEFPGERSP 
^51 AWADYVYFAL AAMTTFGTTD VDVTSRDMRR TVAANTVIAF VFNTVTVAIL 
201 VSALGGR* 

ORF6, hypothetical protein Length: 63 amino acids 

1 MTVMDKLKQM LKGHEDKAGQ GIDKAGDFVD GKTQGKYSGQ VDTAQDKLRD 
51 QFGSDQQEPP QR* 
ORF7, hypothetical protein Length: 185 amino acids 

1 MGTAQSQEQA AAPGACAAFV RFVLCGGGVG LASSFAVVAL ASWVPWALAN 

51 ALVAWSTW ATELHARFTF GAGGRATWRQ HAQSAGSAAA AYAVTCVAMF 

^01 VLQQLVAAPG AVLEQWYLS ASALAGVARF WLRLWFAR NRSLPAAAAV 

151 RTARPVRRVP APVPATVAHA ASRPAGPAAL CPAA* 

AcpX, acyl carrier protein (ACP) Length: 106 amino acids 

1 MTSTDHTSGQ DATELEKQLA AATPEEREKL LTDTIRTQAG TLLNTTLSDD 
51 SNFLENGLNS LTALELTKTL MTLTGMEIAM VAIVENPTPA QLAHHLGQEL 
101 AHTTA* 



KsX, ketoacyl-ACP synthase Length: 829 amino acids 

1 VANEEKLVEY LKWTTAELHQ AQQQLRELKA AQHEPIAWS MACRLPGKTR 

51 TPDDLWDLVS EGRDAVTGFP DDRAWELPEE RPYAELGGFL DDAAGFDAGF 

101 FDISDTEAVA TEPLQRLMLH LAWETVERGH IAPHTLRSTL TGVYVGATGH 

151 DYATRLETAP DELLPYLGGG TSGSLVSGRI AYALGLEGPA ISVDTACSSS 

201 LVALHLACQA LRRGECGLAL AGGGTVMSTP HTFHAFAHQK SLAQDGRCKP 

251 FAAAADGMGL GEGVGLVLLE RLGDARKNGH PVLAVIRGSA VNQDGAGYGL 

301 AAPNGPSQQH VIRAALADAG LTPDQIDAVE AHGTGTPIGD AIEVQALLAT 

351 YGADRSPDRP LWLGSVKSNT GHTQGAAGAA ALIKMVQAFR HGTLPPTLHV 

4 01 DRPTPLAAWK KGAVRLLTEA VDWPRREEPR RVGISAFATS GTNAHLILEE 

451 PPVDEAPVPD AARDQTSPVA PELPVAWSLS ARTPEALRAQ AKALVTHLAA 

501 TDPAPSPAEV AYSLAATRSP LEHRAVLTGT DHTELLAAAR ALAAGEDHPD 

551 LVRSTPGAGP KKIAWHFDGR PADGVTTGAA PGAKPGATFG AT FGAAFGGA 

601 EFHSAFPLFA SAFDEARALL DTHLPTPLPT PHSELARFAV HTALARLLLE 

651 TGVRPHTLTG DGVGHIAAAY AAGILTLDDA CRLAAAHAAA AQAAEGEQPA 

701 PPDAYEPVLK QLTFQRATLT LTSTAPADTP IASADYWHHH LTSPAPTAPP 

7 51 TPETHTLLHL GALSPEGTQT SAVSALLTAL ARLHTTGGTV DWTPLVRRTP 

801 HPRTIDLPTY SFQATRYWLH DHTAHAAV* 

MonCII, probable epoxyhydrolase/cyclase Length: 300 amino acids 

1 VKNLRIPVSQ TVSLNVRYRP ADGPGAPGRP FLLLHGMLSN ARMWDEVAAR 

51 LAAAGHPAYA VDHRGHGESD TPPDGYDNAT WTDLVAAVT ALDLSGALVA 

101 GHSWGAHLAL RLAAEHPDLV AGLALIDGGW YEFDGPVMRA FWERTADWR 

151 RAQQGTTSAA DMRAYLRATH PDWSPTSIEA RLADYRVGPD GLLIPRLTST 

201 QVMSIVAGLQ REAPADWYPK VTVPVRLLPL IPAIPQLSDQ VRAWVAAAEA 



ALEQVSVRWY PGSDHDLHAG APDEIAADLL LLARSCEAMP GGKAGVRPA* 

MonE, S-adeonosylmethionine-dependent methyltransferase Length: 277 
amino acids 

1 VNKTVAPEPS DIGHYYDHKV FDLMTQLGDG NLHYGYWFDG GEQQATFDEA 
51 MVQMTDEMIR RLDPAPGDRV LDIGCGNGTP AMQLARARDV EWGISVSAR 
101 QVERGNRRAR EAGLADRVRF EQVDAMNLPF DDGSFDHCWA LESMLHMPDK 
151 QQVLTEAHRV VKPGARMPIA DMVYLNPDPS RPRTATVSDT TIYAALTDIG 
201 DYPDIFRAAG WTVLELTDIT RETAKTYDGY VEWIRAHRDE YVDIIGVEGY 
251 ELFLHNQAAL GKMPELGYIF ATAQRP* 

^MonT, putative monensin resistance gene (ABC-transporter) Length: 512 
amino acids 



1 


MSADLGARRW 


WAVGALVLAS 


MWGFDVTIL 


SLALPAMADD 


LGANNVELQW 


51 


FVTSYTLVFA 


AGMIPAGMLG 


DRFGRKKVLL 


TALVIFGIAS 


LAC AY ATS SG 


101 


TFIGARAVLG 


LGAALIMPTT 


LSLLPVMFSD 


EERPKAIGAV 


AGAAMLAYPL 


151 


GPILGGYLLN 


HFWWGSVFLI 


NVPWILAFL 


AVSAWLPESK 


AKEAKPFDIG 


201 


GLVFSSVGLA 


ALTYGVIQGG 


EKGWTDVTTL 


VPCIGGLLAL 


VLFVMWEKRV 


251 


ADPLVDLSLF 


RSARFTSGTM 


LGTVINFTMF 


GVLFTMPQYY 


QAVLGTDAMG 


301 


SGFRLLPMVG 


GLLVGVTVAN 


KVAKALGPKT 


AVGIGFALLA 


AALFYGATTD 


^$51 


VSSGTGLAAA 


WTAAYGLGLG 


IALPTAMDAA 


LGALSEDSAG 


VGSGVNQSIR 


401 


TLGGSFGAAI 


LGSILNSGYR 


GKLDLDGVPE 


QAHGAVKDSV 


FGGLAVARAI 


451 


KSNGLADSVR 


SAYVHALDW 


LVVSGGLGLL 


GWLAWWLP 


RHVGQSTAKT 


501 


AESEHEAADA 










MonRII, probable repressor protein Length: 192 amino acids 


1 


VPGLRERKKA 


RTKAAIQREA 


VRLFREQGYT 


ATTIEQIAEA 


AEVAPSTVFR 


51 


YFATKQDLVF 


SHDYDLPFAM 


MVQAQSPDLT 


PIQAERQAIR 


SMLQDISEQE 



101 LALQRERFVL ILSEPELWGA SLGNIGQTMQ IMSEQVAKRA GRDPRDPAVR 
151 AYTGAVFGVM LQVSMDWAND PDMDFATTLD EALHYLEDLR P* 

MonAIX, thioesterase Length: 269 amino acids 

1 MDRGTAARAP QIGDEFGAAT GNGVWLRRYH AAAEAPVRLV CFPFAGGSAS 
51 YYFGLSGLLA PGVEVLAVQY PGRQDRHAEP CLASVAELAD GWPHLPCDG 
101 KPFALFGHSL GAIVAFEVAR RLRGPAGPGL PVHLFVSGGL ARPYRPAGRS 
151 GAFGDADILA HLRAMGGTDE RFFRSPELQE LVLPALRADY RAVATYEAPG 
201 PGRLDCPITA LIGDADERTS PEQAATWRER TGAAFDLRVL PGGHFYLDGC 
251 QEQVAAWTE ALTAGPGV* 

MonAI, polyketide synthase multienzyme MONS 1, housing loading module 
and extension module 1 Length: 3026 amino acids 



1 


MAASASASPS 


GPSAGPDPIA 


WGMACRLPG 


APDPDAFWRL 


LSEGRSAVST 


51 


APPERRRADS 


GLHGPGGYLD 


RIDGFDADFF 


HISPREAVAM 


DPQQRLLLEL 


101 


SWEALEDAGI 


RPPTLARSRT 


GVFVGAFWDD 


YTDVLNLRAP 


GAVTRHTMTG 


151 


VHRSILANRI 


SYAYHLAGPS 


LTVDTAQSSS 


LVAVHLACES 


IRSGDSDIAF 


201 


AGGVNLICSP 


RTTELAAARF 


GGLSAAGRCH 


TFDARADGFV 


RGEGGGLWL 


251 


KPLAAARRDG 


DTVYCVIRGS 


AVNSDGTTDG 


ITLPSGQAQQ 


DWRLACRRA 


301 


RITPDQVQYV 


ELHGTGTPVG 


DPIEAAALGA 


ALGQDAARAV 


PLAVGSAKTN 


351 


VGHLEAAAGI 


VGLLKTALSI 


HHRRLAPSLN 


FTTPNPAIPL 


ADLGLTVQQD 


401 


LADWPRPEQP 


LIAGVSSFGM 


GGTNGHVWA 


AAPDSVAVPE 


PVGVPERVEV 


451 


PEPVWSEPV 


WPTPWPVSA 


HSASALRAQA 


GRLRTHLAAH 


RPTPDAARVG 


501 


HALATTRAPL 


AHRAVLLGGD 


TAELLGSLDA 


LAEGAETASI 


VRGEAYTEGR 


551 


TAFLFSGQGA 


QRLGMGRELY 


AVFPVFADAL 


DEAFAALDVH 


LDRPLREIVL 


601 


GETDSGGNVS 


GENVIGEGAD 


HQALLDQTAY 


TQPALFAIET 


SLYRLAASFG 



651 LKPDYVLGHS VGEIAAAHVA GVLSLPDASA LVATRGRLMQ AVRAPGAMAA 

701 WQATADEAAE QLAGHERHVT VAAVNGPDSV VVSGDRATVD ELTAAWRGRG 

751 RKAHHLKVSH AFHSPHMDPI LDELRAVAAG LTFHEPVIPV VSNVTGELVT 

801 ATATGSGAGQ ADPEYWARHA REPVRFLSGV RGLCERGVTT FVELGPDAPL 

851 SAMARDCFPA PADRSRPRPA AIATCRRGRD EVATFLRSLA QAYVRGADVD 

901 FTRAYGATAT RRFPLPTYPF QRERHWPAAA GVGQQPETPE LPESSESSEQ 

951 AGHEREEGAR AWGGPEGRLA GLSVNDQERV LLGLVTKHVA WLGDASGTV 

1001 QAARTFKQLG FDSMAAAELS ERLGTETGLP LPATLTFDYP TPLAVAAHLR 

^1051 AELTGTPAPA GSAPATGALG AGDLGTDEDP VAIVAMSCRY PGGAGTPEDL 

1101 WRLVADGADA IGDFPTDRGW DLARLFHPDP DRSGTSCTRQ GG FLY DAADF 

1151 DAEFFDISPR EALAVDPQQR LLLECAWEAF ERAGLDPRAL KGSPTGVFVG 

1201 MTGQDYGPRL HEPSQATDGY LLTGSTPSVA SGRLSFSFGL EGPALTVDTA 

12 51 CSSSLVTLHL AAQALRRGEC DLALAGGATV LATPGMFTEF SRQRGLAPDG 

1301 RCKPFAAGAD GTGWAEGVGL VLLERLSEAR RKGHAVLAVI RGSAINQDGA 

1351 SNGLTAPNGP SQQRVIRAAL AAARLTADEV DWEAHGTGT TLGDPIEAQA 

14 01 LLATYGQGRS AERPLWLGSV KSNIGHTQAA AGVAGVIKMV MAMRHDLLPA 

14 51 TLHVDEPSGH VDWSTGAVRL LTEPWWPRG ERPRRAAVSS FGISGTNAHL 

1501 VLEEAGQDEY VAGAADDAGP VDGAVLPWW SGRTGAALRE QARRLRELVT 

1551 GGSADVSVSG VGRSLVTTRA VFEHRAVWG RDRDTLIGGL EALAAGDASP 

1601 DWCGVAGDV GPGPVLVFPG QGSQWVGMGA QLLGESAVFA ARIDACEQAL 

1651 SPYVDWSLTE VLRGDGRELS RVDWQPVLW AVMVSLAAVW ADHGVT PAAV 

17 01 VGHSQGEIAA WVAGALTLE DGAKIVALRS RALRQLSGGG AMASLGVGQE 

1751 QAAELVEGHP GVGIAAVNGP SSTVISGPPE QVAAWADAE ARELRGRVID 

1801 VDYASHSPQV DAITDELTHT LSGVRPTTAP VAFYSAVTGT RIDTAGLDTD 



18 51 YWVTNLRRPV RFADAVTALL ADGHRVFIEA SSHPVLTLGL QETFEEAGVD 

1901 AVTVPTLRRE DGGRARLARS LAQAFGAGCA VRWENWFPAT GTSTVELPTY 

1951 AFQRRRYWLE APTGTQDAAG LGLAAAGHPL LGAATEIADG DIRLLTGRIS 

2001 RHSHPWLAQH TLFGAAVVPA SVLAEWALRA ADEAGCPRVD DLTLRTPLVL 

2051 PETAGVQVQI WGPADARDG HRDFHVYARP DGKDASEGEG IAEGEGASEG 

2101 EGASGGTDAP WTCHADGRLV AEPTGTASED SPDTVWPPPG AEPVDLGDFY 

2151 ERAAATGVGY GPVFTGLRAL WRRDGELFAE AVLPQEAPET AGFGMHPALL 

22 01 DAALHPALLG ERPAEEDKVW LPFTLTGVTL WATGATSVRV RLTPLDDDPD 

2251 ASADGRAWRV GVSDPTGAEV LTCEALVAVA AGRRELRAAG ERVSDLYAVE 

2301 WVPVPGPGPV GEGADFSGWA GLGECGERWE CVGRVERWYE DLDALGAAVE 

2351 GGASVPSWL ATAAAAPGGA GDGAADALSA VRWTGALLDQ WLADARFADA 

24 01 RLWITSGAV ATGDDFLPDP AAAAVRGLVE QAQVRHPGRI LLVDTEAGAG 

24 51 LGVGAGVDDA LLEQAVAMAL GADEPQLALR AGRVLAPRLT APQDAAVTEA 

2501 ARPLDPDGTV LITGPAGAPV ADLAEHLVRT GQCRHLLLLP GDGELEEMAE 

2551 ELRGLGATVD LSTADPADPT ALAEWAAVE GDHPLTGVIH ATGWDAFDP 

2 601 GDSASDLMID SASDSFAEAW SSRAGVTAAL HTATAHLPLD LFAVLSPAGA 

2 651 DLGIARSAAA AGADAFSAAL ALRRHTTVTT DTTAPPRTTA PPRTTASPRT 

2701 TALSSSRTTG VALAYGPPTA PRPGIKGTAP GRIPVLLDAA RAHGGGSPLL 

2751 GARLAARALA AESAAEGVAG LPAPLRALAV AAAAAGAPTR RTAADRKPPA 

2801 DWPARLAPLS APEQLRLLID AVRT HAAAVL GRTDPEALRG DATFKQLGLD 

28 51 SLTAVELRNR LVEDTGLRLP TALVFRYPTP AAIAAHLRER LTSPSETTAT 

2 901 QRSGGQTPAA GQASSALAPG GSAAGPPAAD TVLSDLTRME NTLSVLAAQL 

2 951 PHTETGEITT RLEALLTRWK TTNATANDSG DGNGGDDDAA ERLKAASADQ 

3001 IFDFIDNELG VGHGTSRVTP TPKAG* 



MonAII, polyketide synthase multienzyme MONS 2, housing extension module 
2 Length: 2239 amino acids 



1 


MASEEQLVEY 


LRRVTTELHD 


TRRRLVQEED 


RRQEPVALVG 


MACRFPGGVA 


51 


SPEDLWDLVA 


AGKDAIEDFP 


TDRGWDLEAL 


YDPDPAAYGT 


SYVRHGGFVD 


101 


DAGSFDADFF 


GISPREALAM 


DPQQRLMLET 


SWELFERAGI 


EPVSLKGSRT 


151 


GVYAGVSSED 


YMSQLPRIPE 


GFEGHATTGS 


LTSVISGRVA 


YNYGLEGPAV 


201 


TVDTACSASL 


VAIHLASQAL 


RQRECDLALA 


GGVLVLSSPL 


MFTEFCRQRG 


251 


LAPDGRCKPF 


AAAADGTGFS 


EGIGLLLLER 


LSDARRNGHK 


VLAVIRGSAV 


301 


NQDGASNGLT 


APNDAAQEQV 


IRAALDNARL 


TPSEVDAVEA 


HGTGTKLGDP 


351 


I E AG ALL AT Y 


GQHRARPLLL 


GSLKSNIGHT 


HATAGVAGVI 


KTVMAIRNGL 


401 


LPATLHVEEL 


SPHVDWDAGA 


VEWTEPTPW 


PETGHPRRAG 


VSAFGISGTN 


451 


AHLILEEAPP 


EEDVPAPWV 


ESGGWPWW 


SGRTPEALRE 


QARRLGEFVA 


501 


GDTDALPNEV 


GWSLATTRSV 


FEHRAVWGR 


DRDALTAGLG 


ALAAGEASAG 


551 


WAGVAGDVG 


PGPVLVFPGQ 


GAQWVGMGAQ 


LLDESAVFAA 


RIAECERALS 


601 


AHVDWSLSAV 


LRGDGSELSR 


VEWQPVLWA 


VMVSLAAVWA 


DYGVTPAAVI 


651 


GHSQGEMAAA 


CVAGALSLED 


AARIVAVRSD 


ALRQLQGHGD 


MASLSTGAEQ 


701 


AAELIGDRPG 


VWAAVNGPS 


STVISGPPEH 


VAAWADAEA 


RGLRARVIDV 


hsi 


GYASHGPQID 


QLHDLLTERL 


ADIRPTNTDV 


AFYSTVTAER 


LTDTTALDTD 


801 


YWVTNLRQPV 


RFADTIEALL 


ADGYRLFIEA 


SAHPVLGLGM 


EETIEQADMP 


851 


ATWPTLRRD 


HGDTTQLTRA 


AAHAFTAGAD 


VDWRRWFPAH 


PAPRTTHT PT 


901 


YAFQRRRYWL 


ADTVKRDSGW 


DPAGSGHAQL 


PTAVALADGG 


WLNGRVSAE 


951 


RGGWLGGHW 


AGTVLVPGAA 


LVEWVLRAGD 


EAGCPSLEEL 


TLQAPLVLPE 


1001 


SGGLQVQWV 


GAADEQGGRR 


DVHVYSRSEQ 


DASAVWQCHA 


VGELGRASVA 


1051 


RPVRQAGQWP 


PAGAEPVEVG 


GFYEGVAAAG 


YEYGPAFRGL 


RAMWRHGDDL 



1101 LAEVELPEEA GSPAGFGIHP ALL DAALH PL LAQRSRDGAG AGAHGGQVLL 

1151 PFSWSGVSLW AS EAT T VR VR LTGLGGGDDE TVSLTVTDPA GGPWDVAEL 

1201 RLRSTSARQV RGSAGPGADG LYELRWTPLP EPLPVPAPAN GRDVAADLSG 

1251 CAVLGELVAE PGPGIDLEGC PCYPGVGALA DNASPPSMIL APVHSDTTGG 

1301 DGLALTERVL RVIQDFLAAP SLEQKQTRLA FVTRGAADTG STTGGSAAPA 

1351 EAVDPAVAAV WGLVRSAQSE NPGRFVLLDT DAPLDQASVA PLVDAVRSAV 

1401 EADEPQVALR GGRLLVPRWA RAGEPVELAG PAGARAWRLV GGDSGTLEAV 

1451 VAE AC DDI VL RPLAPGQVRV AVHTAGVNFR DVLIALGMYP DPDALPGTEA 

1501 AGWTEVGPG VTRLSVGDRV MGMMDGAFGP WAVADARMLA PVPPGWGTRQ 

1551 AAAAPAAFLT AWYGLVELAG LKAGERVLIH AATGGVGMAA VQIARHVGAE 

1601 VFATAS PGKH AVLEEMGIDA AHRASSRDLA FEDAFRQATD GRGVDWLNS 

1651 LTGELLDASL RLLGDGGRFV EMGKSDPRDP ELVALEH PGV SYEAFDLVAD 

1701 AGPERLGLML DRLGELFAGG SLVPLPVTAW PLGRAREALR HMSQARHTGK 

1751 LVLDVPAPLD PDGTVLVTGG TGTIGAAVAE HLARTGESKH LLIVSRSGPA 

1801 AHGAEELVSR IAEFGAEATF VAADVSEPDA VAALIEGIDP AHPLTGWHA 

1851 AGVLDNALIG SQTTESLTRV WAAKAAAAQQ LHEATRESRL GLFVMFSSFA 

1901 STMGT PGQAN YSAANAYCDA LAALRRAEGL AGLSVAWGLW EATSGLTGTL 

1951 SAADRARIDR YGIRPTSAAR GCALLAAARA HGRPDLLAMD LDARVPAASD 

2001 APVPAVLRTL AAAGA PAT AR PTAAAAADGA TDWSGRLAGL TEEARLELLT 

2051 ELVCTHAAGV LGHADAGAVQ VDAPFKELGF DSLTAVELRN RIAAATGLKL 

2101 PAALVFDYPQ ARVLAAHLAE RLVPEGAGAM GGVSGAEGVR DAYGAGGPGG 

2151 DMTAQVLLEV ARVEHTLSAA VPHGLDRAAV AARLEALLAR CTATTAATGA 

2201 AGAAVEGDGD SDGDGAVDQL ETATAEQVLD FIDNELGV* 



MonAIII, polyketide synthase multienzyme MONS 3, housing extension 
modules 3 and 4 Length: 4133 amino acids 



# 



1 


MVSEEKLVDY 


LKRVSADLHA 


TRQRLREAEE 


RGQEPVAWE 


AACRYPGGIR 


51 


TPEDLWDLVA 


AGGNALGAFP 


DNRGWDLRRL 


FHPDPDHPGT 


TYAREGGFLH 


101 


DADLFDPEFF 


GISPREAAVL 


DPQQRLLLEC 


AWEALERAGI 


DPRSLQGSRT 


151 


GVYAGAALPG 


FGTPHIDPAA 


EGHLVTGSAP 


SVLSGRLAYT 


FGLEGPAVTI 


201 


DTACSSSLVA 


VHLAAHALRQ 


RECDLALAGG 


VTVMTTPYVF 


TEFSRQRGLA 


251 


ADGRCKPFAA 


AADGTAFSEG 


AGLLVLERLS 


DARRAGHRVL 


AVIRGSAVNQ 


301 


DGASNGLTAP 


NGPAQQRVIR 


AALAGARLSP 


AEVDAVEAHG 


TGTRLGDPIE 


351 


ADALLATYGQ 


ERHGGRPLWL 


GSVKSNIGHT 


QGAAGAAGLI 


KMVQALRHET 


401 


LPATLYADEP 


TPHADWESGA 


VRLLSAPVAW 


PRGEHGEHTR 


RAGISSFGIS 


451 


GTNAHLILEE 


APAADAEGAG 


GDGDGDGGGV 


RPWRVGATG 


PREEQGQGQG 


501 


QEQHQQQRQQ 


RQRSSMMPTP 


HLPWLLSARS 


PAALRAQADA 


LANHVAHADH 


551 


SIADIGGTLL 


RRTLFEHRAV 


VLGTDRDERA 


AALAALAAGR 


AKPALTRAAG 


601 


PARNGGTAFL 


FTGQGSQRPG 


MGRQLYDTFD 


VFAESLDETC 


ARLDPLLEQP 


651 


LKPVLFAPAD 


TAQAAVLHGT 


GMTQAALFAL 


EVALYRQVTS 


FGIAPSHLTG 


701 


HSVGEIAAAH 


VAGVFSLADA 


CTLVAARGRL 


MQALPAGGAM 


LAVQAAE DDV 


751 


LPLLAGQEER 


LSLAAVNGPT 


AVWSGEAAA 


VGEVEKALRG 


RGLKTKRLNV 


fcoi 


SHAFHSPLIE 


PMLDDFREVA 


RGLTFHAPTL 


PWSNLTGRL 


ADAELMADAE 


851 


YWVRHVRRPV 


RFHDGLRALS 


EQGWRYLEL 


GPDPVLATMV 


QDGLPAPAEG 


901 


EEPEPWAAA 


LRSKHDEGRT 


LLGAVAALHT 


DGQPADLTAL 


FPADAGQVPL 


951 


PTYRFQRRRY 


WRVAPDAAAP 


ARAAGLQETG 


HPLLPAVIRQ 


ADGGILLAGR 


1001 


LSLRTHPWLA 


DHTIAGGVPL 


PATAFVELAL 


LAGRHAACDT 


IDDLTLETPL 


1051 


LLDDTGTGVG 


AAVGAGADAL 


VDAIEVQLAL 


GAPDGSGRRA 


LTVHSRPADD 


1101 


AADDGDAADA 


ADAAGRGGPG 


GSGDLGDPGD 


PGDLGDGGGS 


RGWRRHATGI 



1151 LSAGPAAEPA APDAAPWPPA DATALDVDAL 

1201 AWRHGDDLYA DVRLADEQRA EADAFALHPA 

1251 QEQGQGGQEP EQGRGDADAP VRLPFSFSDI 

1301 RLRLSLTDGE GGQVATVDAL QLRLIPADRW 

1351 LPEPAETDPA AHSWAVLGAH DAGLAPAAHY 

14 01 FAPFPAQGTE TDVPAQVRAH ARHALELLRD 

14 51 TARPEDGPAD LATAPVWGLV. RAAQAEQPDH 

1501 AT DAGTASRH ALPAALAAAA AQAETQLALR 

1551 LHATAPESTT DTVDSTGIAG AAESGGTVLI 

1601 ARHLLLVSRR GDAAEGVAEL RADLADDGVD 

1651 IPAAHPLTAV VHTAGVIDDS LITAMTPERL 

17 01 KDLSAFVLFS SGASVLGNGG QANYAAANTF 

17 51 GLWESASGGM AARLGDADRA RIHRTGVTGL 

1801 LATRFDRAVL RGQAAARTLQ PALRGLVRTP 

1851 ENAPSSWAAR LARLSAADRD RALNELIREQ 

1901 ELGFDSLTAL ELRNRLSTAT GIRLPATLVF 

1951 QHTSPTAPGA SAEGTAATAT GIDDDPIAIV 

2001 ATGTDAIGPF PEDRGWDTAG LFDPDPDQVG 

20 51 FGISPREAAA TDPQQRLLLE TAWQAFEHAG 

2101 DYGSRFLARK PDGFEGRIMT GSTPSVASGR 

2151 SLVAMHLAAQ ALRQGECELA LAGGVTVMAT 

2201 PFAAAADGTG WGEGAGLVVL ERLSDARRKG 

2251 MTAPNGPSQE RVIRTALAGA GRGPEDIDW 

2301 TYGQGRPEDR PLWLGSVKSN IGHTQAAAGV 



YARLDAQGYS YGPAFRAVHA 
LLDAALHAVD ELYRGSEGRG 
RHHATGATRL WVRLSPQGDD 
RAARPTTAAP LYHLDWHELP 
PDLAALKAAV EAGEPVPDIV 
WLTTEAFAAA RLWLTTGAV 
WLVDIDKDI DKDTDEETDQ 
AGTVLVPRLA WPPRTDT PA 
TGGTGGLGQA VARHLAAAHG 
VRVAACDITD RDALAGLLAD 
DAVLAPKADA AWHLHELTRD 
LNTLAEHRRA AGLAATSVAW 
TDEQALALFD AALTAEHPTV 
RPTASAGAIG STAATGSATD 
IATVLAHPSP DTIELGRAFQ 
DHPSPTALVR HLHSHLPDEA 
GMACRYPGGV TSPEQLWQLV 
HSYTREGGFL YDAARFDAGF 
IDPAALRGTP CGVITGIMYD 
VAYTFGLEGP AITVDTACSS 
PNTFVEFSRQ RGLAPDGRCK 
HRVLALLRGS AVNQDGASNG 
EAHGTGTTLG DPIEAQALLA 
AG V I KM VMAL RHEQLPTTLH 



351 


ADEPTPHVQW 


r DGGGVRLLTE 


, PVPWSRGERT 


1 RRAGVSSFGI 


SGTNAHLILE 


2401 


EPPEEDLPEP 


VAAEPGGVVP 


WWSGRTPDA 


. LREQARRLGE 


FWGAGDVSA 


2451 


AEVGWSLATT 


RSVFEHRAW 


AGRDRDDLVA 


GMQALAAGET 


PTDWSGAAA 


2501 


SSGAGPVLVF 


PGQGSQWVGM 


GAQLLDESPV 


FAARIAECEQ 


ALSAYVDWSL 


2551 


SDVLRGDGSE 


LSRVEWQPV 


LWAVMVSLAA 


VWADYGVTPA 


AVVGHSQGEM 


2601 


AAACVAGALS 


LEDAARIVAV 


RSDALRQLQG 


HGDMASLGTG 


AEQAAELIGD 


2651 


RPGVWAAVN 


GPSSTVISGP 


PEHVAAWAE 


AEARGLRARV 


IDVGYASHGP 


2701 


QIDQLHDLLT 


EGLADIRPAN 


TDVAFYSTVT 


AERLTDTTAL 


DTDYWVTNLR 


2751 


QPVRFADTIE 


ALLADGYRLF 


IEASAHPVLG 


LGMEETIEQA 


DIPATWPTL 


^801 


RRDHGDTTQL 


TRAAAHAFTA 


GADVDWRRWF 


PADPTPRTVD 


LPTYAFQHQH 


2851 


YWLEEPSGLT 


GDAADLGMVA 


AGHPLLGACV 


ELAESDSYLF 


TGRLSRRAPS 


2901 


WLAEHVVAGT 


VLVPGAALVE 


WVLRAGDEAG 


CPTIEELTLQ 


APLVLPESGG 


2951 


LQVQVWGAT 


DEQSGRRDVH 


VYSRSEQDAS 


AVWVCHAVGV 


VSSEMPEAAA 


3001 


ELSGQWPPAG 


AEAVDVEDFY 


ARAAEAGYAY 


GPAFQGLRAL 


WRHGTELFAE 


3051 


WLPEQAGGH 


DGFGIHPALL 


DAALHPLMLL 


DRPADGQMWL 


PFAWSGVSLN 


3101 


ADRATHVRVR 


LSPRGEAAER 


DLRVVIADAT 


GAPVLTVDAL 


TLRAADPGRL 


3151 


GAAARGGVDG 


LYTVDWTPLP 


LPQPLPLPRT 


DAGGSADWVI 


LSDNSSAALA 


201 


DAVSSATAAG 


GGAPWALLAP 


VGGGSADDGL 


PWRRTLSLV 


QEFLAAPELT 


3251 


ESRLVIVTRG 


AVATDADGDV 


AASAAAVWGL 


IRSAQSENPG 


RFVLLDVEEE 


3301 


HLHPDGGELP 


YAALRHAVEE 


LDEPQLALRS 


GKFLVPRMTP 


AAAPEELVPP 


3351 


VGTSGWRLGT 


SGTATLENLS 


VIDAPEAFAP 


LEPGQVRISV 


RAAGMN FRDV 


3401 


LIALGMYPDK 


GTFAGSEGAG 


HVTEVGPGVT 


HLSVGDRVMG 


LFEGAFAPLA 


3451 


VA D ARM W P I 


PEGWSFQEAA . 


AVPWFLTAW 


YGLVDLGRLR . 


AGESLLIHAG 


3501 


TGGVGMAATQ 


IARHLGAEVF ATAS PAKHGV : 


LDGMGIDAAH 


RASSRDLDFE 



3551 


ETLRAATGGR 


GMDWLNSLA 


GEFTDASLRL 


LAEGGRMVDM 


GKTDKRDPDR 


3601 


VAAEHAGAWY 


RAFDLVPHAG 


PDRIGEMLAE 


LGELFASGAL 


APLPVQTWPL 


3651 


GRAREAFRFM 


SQAKHTGKLV 


LEIPPALDPD 


GTVLITGGTG 


VLAAAVAEHL 


3701 


VREWGVRHLL 


LAGRRGSEAP 


GSSELAEELT 


ELGAEVTFAA 


ADVSDPDAVA 


3751 


ELVGKTDPAH 


PLTGVIHAAG 


VLDDAWTAQ 


TPESLARVWA 


AKATAAHLLH 


3801 


EATREARLGL 


FLVFSSAAAT 


LGSPGQANYA 


AANAYCDALV 


RQRRAEGLAG 


3851 


LSIGWGLWQT 


ASGMTGHLGE 


TDLARMKRTG 


FTPLTTEGGL 


ALLDAARAHG 


3901 


RPHWAVDLD 


ARAVAAQPAP 


SRPALLRALA 


AGATPGARTA 


RRTAAAGSVA 


3951 


PAGGLADRLA 


GLPHPERRRL 


LLDLVRGNVA 


GVLGHSDHDA 


VRPDTSFKEL 


4001 


GFDSLTAVEL 


RNRLAAATGL 


KLPAALVFDY 


PESATLVDHL 


LERLSPDGAP 


4051 


PPVKDAADPV 


LNDLGRIESS 


LDALALDADA 


RSRVTRRLNT 


LLSKLNGAAT 


4101 


AGSPADVTDL 


DALDALDDVS 


DDEMFEFIDR 


EL* 





MonAIV, polyketide synthase multienzyme MONS 4, housing extension 
modules 5 and 6 Length: 4039 amino acids 

1 MSSAEESSPD VSGTGVSGTG ESATGTSSTE AKLRQYLKRV TVDLGQARRR 

51 LREVEERAQE PIAIVSMACR FPGDTRTPEA LWDLVAEGGD AIDDFPTNRG 
101 WDLESLYHPD PDHPGTSYVR RGGFLYDAPA FDASFFGISP REALAMDPQQ 
151 RVLMETAWQL LERAGIDPAS LKLSATGVYI GAGVLGFGGA QPDKTVEGHL 
201 LTGSALSVLS GRISFTLGLE GPSVSVDTAC SSSLVSMHLA AQALRQGECD 
251 LALAGGVTVM STPGAFTEFS RQGALSPDGR SKAFAASADG TGFSEGAGLL 
301 LLERLSDARR NGHKVLAVIR GSAVNQDGAS NGLTAPNGPS QERVIRAALA 
351 NAGLGAAEVD AVEAHGTGTK LGDPIEAGAL LATYGRDRDE DRPLWLGSVK 
401 SNIGHPQGAA GVAGVI KMVM ALQRELLPAT LYVDEPTPHV DWSSGSVRLL 
4 51 TEPVPWTRGE RPRRAGVSAF GMSGTNAHVI LEEAPPEEAA AAETPAEGTG 
501 AWPWWSGR GEEALRAQAA QLAEHVRDDD QRPASPLEVG WSLATTRSVF 



551 ENRAVWGDD RDALLDGLRS LAAGEASPDV VSGAVGPTGP GPVMVFPGQG 

601 GQWVGMGARL LDESPVFAAR IAECEQALSA YVDWSLTDVL RGDGSELARI 

651 DWQPVLWAV MVALAAVWAD QGIEPAAWG HSQGEIAAAC WGAISLDEA 

7 01 ARIVAVRSVL LRQLSGRGGM ASLGMGQEQA ADLIDGHPGV WAAVNGPSS 

7 51 TVISGPPEGI AAVVADAQER GLRARAVASD VAGHGPQLDA ILDQLTEGLA 

801 GIRPAATDVA FYSTVTAGHL TDTTELDTAY WVRNVRRTVR FADTIDALLA 

851 DGYRLFIEVS PHPVLNLALE GLIERAAVPA TWPTLRRDH GDTTQLARAA 

901 AHAFAAGADV DWRRWFPADP APRTVDLPTY AFQRQDFWPA PAGGRSGDPA 

^951 GLGLAASGHP LLGASVGLAS GDVHLLSGRV SRQSAAWLDD HWAGQALVP 

1001 GAAQVEWVLR AGDDAGCSAL EELTLQTPLV LPDTGGLRIQ VWEAADAHG 

1051 RRDVRLFSRP DDDDAFASTH PWTCHATGVL APAPTDGTNG TRDAADTLDG 

1101 AWPPADAEPV PADDLYAQAD RTGYGYGPAF RGVRALWRHG KDVLAEVTLP 

1151 KEAGDPDGFG IHPALLDAVL QPAALLLPPT DAEQVWLPFA WNDVALHAVR 

1201 ATTVRVRLTP LGERIDQGLR ITVADAVGAP VLTVRDLRSR PTDTGRLAAA 

1251 ATRDRHGLFD LEWIAPENAA ENAAGPARDA SEGWVTLGED AASLADLLAS 

1301 VEAGAPAPQL VAAPVEPDRT DDGLALATHV LDLVQTWLAS PLHDSRLVLV 

1351 TRGAVTDADV DVAAAAVWGL VRSAQSEHPG RFTLIDLGPD DTLAAAMQAA 

14 01 HLEEPQLAVH GGEIRVPRLV RATTDPTAPN GTPEADRTAD PSEGLHRNGT 

14 51 VLITGGTGVL GRLVAEHLVT EWGVRHLLLA SRRGDQAPGS AELRARLSEL 

1501 GASVEIAPAD VGDAEAVAAL IASVDPAHPL TGVIHAAGVL DDAVITAQTP 

1551 ESLARVWATK ATAARHLHEA TRETPLDFFV VFSSAAASLG SPGQANYAAA 

1601 NAYCDALVQH RRAQGLAGLS IAWGLWQATS GMTGQLSETD LARMKRTGFA 

1651 ALTDEGGLAL LDAARAHDRA YWAADLDPR AVTDGLSPLL RALTAPATRR 

1701 RVASEGLADG ALATRLAGLD ADGRLRLLTD WREYVAAVL GHGSAARVGV 



17 51 DIAFKDLGFD SLTAVELRNR LSAACDVRLP ATLI FDHPTP QALATHLVDR 

1801 LAGSTSATTT. VNATAPAAAH VAAGADVDAD TDDPVAIVAM TCRFPGGVAS 

1851 PDDLWDLLDA RKDAMGAFPT DRGWDLERLF HPDPDHPGTS YTDQGGFLPD 

1901 AGDFDAAFFG INPREALAMD PQQRLLLEAS WEVLERAGID PTTLKGTPTG 

1951 TYVGLMYHDY AKSFPTADAQ LEGYSYLAST GSMVSGRVAY TLGLEGPAVT 

2001 VDTACSSSLV SIHLATQALR HGECDLALAG GVTVMADPDM FAGFSRQRGL 

20 51 SPDGRCKAYA AAADGVGFSE GVGVLLLERL SDARRHGRRV LGWRGSAVN 

2101 QDGASNGLTA PNGPSQERVI RQALASGGLS SVDVDVVEGH GTGTTLGDPI 

2151 EAQALLATYG QGRPEDRPLW LGSVKSNIGH TQAAAGVAGV IKMVMAMRHG 

2201 WPASLHVDV PSPHVEWDSG AVRLAVESVP WPQVEGRPRR AGVSSFGASG 

22 51 TNAHVIVESV PDGLEEDSVS VGGEALETET DGRLVPWWS ARSPQALRDQ 

2301 ALRLRDFASD AS FRAPLADV GWSLLKTRAL HEHRAVWGA ERAELIAALE 

2351 ALATGEPHAA LVGPACSQAR VGGDDWWLF SGQGSQLVGM GAGLYERFPV 

24 01 FAAAFDEVCG LLEGPLGVEA GGLREWFRG PRERLDHTVW AQAGLFALQV 

24 51 GLARLWESVG VRPDWLGHS IGEIAAAHVA GVFDLADACR WGARARLMG 

2501 GLPEGGAMCA VQAT PAELAA DVDGSAVSVA AVNTPDSTVI SGPSDEVDRI 

2551 AG VWRE RGRK TKALSVSHAF HSALMEPMLA EFTEAIRGVK FRQPSIPLMS 

2 601 NVSGERAGEE ITDPEYWARH VRNAVLFQPA IAQVADSAGV FVELGPAPVL 

2 651 TTAAQHTLDE SDSQESVLVA SLAGERPEES AFVEAMARLH TAGVAVDWSV 

2701 LFAGDRVPGL VELPTYAFQR ERFWLSGRSG GGDAATLGLV AAGHPLLGAA 

27 51 VEFADRGGCL LTGRLSRSGV SWLADHWAG AVLVPGAALV EWALRAGDEV 

2801 GCVTVEELML QAPLWPEAS GLRVQVWEE AGEDGRRGVQ IYSRPDADAV 

2851 GGDDSWICHA TGVLSPESAR LDTELGGVWP PAGAEPLDVD GFYAQAGEAG 

2 901 YGYGPAFRGL RAVWRHGQDL LAEWLPEAA GAHDGYGIHP ALLDATLHPL 
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AQLAPDGDTP 


AGAEATDPVL 


RDLAKLENAL 


SSTLVEHLDA 


DAVTARLEAL 


4001 


LSNWKAASAA 


PGSGSTKEQL 


QVATTDQVLD 


FIDKELGV* 





MonAV, polyketide synthase multienzyme MONS 5, housing extension 
modules 7 and 8 Length: 4107 amino acids 



1 MASEEELVDY LKRVAAELHD TRQRLREVED RRQEPVAVVG MACRFPGGIE 
51 TPEGLWELVA AGDDAIEPFP TDRGWDLEGI YHPDPDHPGT CYVREGGFLA 
101 APDRFDSDFF GFS PRE ALAS SPQLRLLLET SWEALERAGI NPASLKGSPT 
151 GVYVGAATTG NQTQGDPGGK ATEGYAGTAP SVLSGRLSFT LGLEGPAVTV 
201 ETACSSSLVA MHLAANALRQ GECDLALAGG VTVMSTPEVF TGFSRQRGLA 
251 PDGRCKPFAA AADGTGWGEG AGLILLERLS DARRKGHKVL AVIRGSAINQ 
301 DGASNGFTAP NGPSQRRVIR QALSSAHLST SEIDWEAHG TGTRLGDPIE 
351 AEALIATYGK EREDDRPLWL GSVKSNIGHT QAAAGVAGVI KMVMALQREL 
401 LPATLNVDEP TPHVQWEGGG VRLLTEPVPW SRGERPRRAG ISSFGISGTN 
451 AHWLEEAPP EEDVPGPVAA EPEGWPWW SARTEEALSE QARRLGEFVA 
501 DTDPSTADVG WSLTTSRAIL EHRAVWGRD RDALTAGLAA LAAGEESADV 
551 VAGVAGDVGP GPVLVFPGQG SQWVGMGAQL LDESPVFAAR IAECEQALSA 
601 YVDWSLSAVL RGDGSELSRV EVVQPVLWAV MVSLAAVWAD YGVTPAAVIG 
651 HSQGEMAAAC VAGALSLEDA ARWAVRS DA LRQLMGQGDM ASLGASSEQA 
701 AELIGDRPGV CIAAVNGPSS TVISGPPEHV AAWADAEER GLRARVIDVG 
751 YASHGPQIDQ LHDLLTDRLA DIRPATTDVA FYSTVTAERL TDTTALDTDY 
801 WVTNLRQPVR FADTIDALLA DGYRLFIEAS AHPVLGLGME ETIEQADIPA 
8 51 TWPTLRRDH GDTTQLTRAA AHAFTAGATV DWRRWFPADP TPRTIDLPTY 
901 AFQRRSYWLP VDGVGDVRSA GLRRVEHSLL PAALGLADGA LVLTGRLAAS 
951 GGGGGWLADH AVAGTTLVPG AALVEWALRA ADEAGCPSLE ELTLQAPLVL 
10 01 PGSGGLQVQV WGPADGQGG RREVRVFSRV DSDDEAAGQD EGWSCHATGV 
1051 LSPEPGAVPD GLSGQWPPTG AEPLEISDLY EQAASAGYEY GPSFRGLRSV 
1101 WRHGHNLLAE VELPEQAGAH DDFGIHPVLL DAALHPALLL DQNAPGEEQE 
1151 PAQPALRLPF VWNGVSLWAT GAATVRVRLA PHGGGETDDS AGLRVTVADA 



201 


TGAPVLSVDS 


LALRPADPEL 


LRTAGRAGSG 


TNGLFTVEWT 


ALPPADVADH 


1251 


AAGDGWAVLG 


QDVPDWAGAD 


MPRHPDMASL 


SAALDEGTQA 


PAAVFVETTA 


1301 


TSHATPNTAA 


DVTLDASGRA 


VAERTLHLLR 


DWLAEPRLAE 


TRLVLITHHA 


1351 


VTTPADDDVN 


AAPLDVPAAA 


LWGLIRSAQA 


EHPDRFVLLD 


TDAKANTDPG 


1401 


PDTSTDHSTA 


SGTYRTVIAR 


ALATGEPQLA 


VRAGELLAPR 


LARAATPTPE 


1451 


TPTPETQPDT 


GSGSEAGAGS 


GSGPGATLDP 


DGTVLIAGGT 


GMMGGLVAEH 


1501 


LVRAWSVRHL 


LLVSRQGPDA 


PDARDLADRL 


VGLGATVRIV 


AADLT DGRAT 


1551 


ADLVASVDPA 


HPLTGVIHAA 


G VL D DA V VTA 


QTSDQLARVW 


AAKASVAANL 


1601 


DAATSELPLG 


LFLMFSSAAG 


VLGNAGQAGY 


AAANAFVDAL 


VGRRRATGLP 


£L651 


GLSIAWGLWA 


RGSAMTRHLD 


DADLARLRAG 


GVKPLLDEQG 


LALLDAARAT 


1701 


AAHTSLVVAA 


GIDVRGLNRD 


DVPAILRDLA 


GRTRRRAAAD 


STVDQAALER 


1751 


RLTGLDEAER 


RAVVTDWRE 


CVAAVLGHRS 


AADVRTEANF 


KDLGFDSLTA 


1801 


VQLRNRLSAA 


SGLRLPATLA 


FDHPTPQALA 


AYLGTRLSGR 


TATPVAPVAP 


1851 


SAAATDEPVA 


IVAMACKYPG 


GATSPEGLWD 


LVAEGVDAVG 


AFPTGRGWDL 


1901 


ERLFHPDPDH 


PGTSYADEGA 


FLPDAGDFDA 


AFFGINPREA 


LAMDPQQRLL 


1951 


LEASWEVLER 


AGIDPTTLKG 


TPTGTYVGVM 


YHDYAAGLAQ 


DAQLEGYSML 


2001 


AGSGSWSGR 


VAYTLGLEGP 


AVTVDTACSS 


SLVSIHLAAQ 


ALRQGECTLA 


051 


LAGGVTVMAT 


PEVFTGFSRQ 


RGLAPDGRCK 


PFAAAADGTG 


WGEGVGVLLL 


2101 


ERLSDARRHG 


RRVLGWRGS 


AVNQDGASNG 


LTAPNGPSQE 


RV IRQ ALAS G 


2151 


GLSSVDVDVV 


EGHGTGTTLG 


DPIEAQALLA 


TYGQGRPVDR 


PLWLGSVKSN 


2201 


IGHTQAAAGV 


AG V I KM VM AM 


RHGWPASLH 


VDVPSPHVEW 


DSGAVRLAVE 


2251 


SVPWPEVEGR 


PRRAGVSSFG 


ASGTNAHVIV 


ESVPDGLGED 


SVSVSGEAPE 


2301 


TETDGRLVPW 


WSARSPQAL 


RDQALRLRDA 


VAADSTVSVQ 


DVGWSLLKTR 


2351 


ALFEQRAVW 


GRERAELLSG 


LAVLAAGEEH 


PAVTRSREDG 


VAASGAWWL 



24 01 FSGQGSQLVG MGAGLYERFP VFAAAFDEVC GLLEGPLGVE AGGLREWFR 

24 51 GPRERLDHTM WAQAGLFALQ VGLARLWESV GVRPDVVLGH SIGEIAAAHV 

2501 AGVFDLADAC RVVGARARLM GGLPEGGAMC AVQAT PAELA ADVDDSGVSV 

2551 AAVNTPDSTV ISGPSGEVDR IAGVWRERGR KTKALSVSHA FHSALMEPML 

2 601 AEFTEAIREV KFTRPKVSLI SNVSGLEAGE EIASPEYWAR HVRQTVLFQP 

2 651 GIAQVASTAG VFVELGPGPV LTTAAQHTLD DVTDRHGPEP VLVSS LAGER 

27 01 PEESAFVEAM ARLHTAGVAV DWSVLFAGDR VPGLVELPTY AFQRERFWLS 
2751 GRSGGGDAAT LGLVAAGHPL LGAAVE FADR GGCLLTGRLS RSGVSWLADH 

28 01 WAGAVLVPG AALVEWALRA GDEVGCVTVE ELMLQAPLW PEASGLRVQV 
28 51 WEEAGEDGR RGVQIYSRPD ADAVSGDDSW ICHATGTLTP QHTDAPNDGL 
2 901 AGAWPAAGAV PVDLAGFYER VADAGYAYGP GFQGLRAVWR HGQDLLAEVV 
2951 LPEAAGAHDG YGIHPALLDA TLHPALLLDW PGEVQDDDGK VWLPFTWNQV 
3001 SLRAAGAATV RVRLSPGEHD EAEREVQVLV ADATGT DVLS VGS VTLRPAD 
3051 IRQLQAVPGH DDGLFSVDWT PLPLSRTDVS QTDADGDADW WLSDGVGSL 
3101 ADWSAAGGE APWAVVAPVG ASAGGGLAGF DRREGLDGRL WERVLSLVQ 
3151 EFLAAPELAE SRLLVLTRGA VATGGDGDGD V DAS AAAVWG LVRSAQSENP 
3201 GRFILLDVDM DVDVDVDMDV DVDVDVDVDV DGDGNGSDLD PDLNGRRLPH 
32 51 ATLRHAAEEL DEPQLALRDG QLLVPRLVRA TGGGLWAPT DRAWRLDKGS 
3301 AETLESVAPV AYPGVMEPLG PGQVRLGIHA AGINFRDVLV SLGMVPGQVG 
3351 LGGEGAGWT ETGPDVTHLS VGDRVMGVLH GSFGPTAVAD TRMVAPVPQG 
3401 WDMRQAAAMP VAYLTAWYGL VELAGLKAGE RVLIHAATGG VGMAAVQIAR 
34 51 HLGAEVFATA SAAKHWLEE MGIDAAHRAS SRDLAFEDTF RQATDGRGMD 
3501 WLNSLTGEF IDASLRLLGD GGRFLEMGKT DVRTPEEVAA EYPGVTYTVY 
3551 DLVTDAGPDR IAVMMSELGE RFASGALDPL PVRSWPLDKA REAFRFMSQA 



J3601 


KHTGKLVLDV 


PAPLDPDGTV 


LITGGTGALG 


QWAEHLVRE 


WGVRHLLLAS 


3651 


RRGLDAPGSG 


ELADRLSDLG 


AEVTVAAADV 


SDPASVVELV 


GKTDPSHPLT 


3701 


GWHAAGVLE 


DGIVTAQTPE 


GLARVWAAKA 


AAAANLHEAT 


REMRLGLFW 


3751 


FSSAAATLGS 


PGQANYAAAN 


AYCDALMQRR 


RAAGQVGLSV 


GWGLWEAPDA 


3801 


KPGVAADAKP 


DVAADAKTGV 


AADGTPQGMT 


GTLSGTDVAR 


MARIGVKAMT 


3851 


SAHGLALLDA 


AHRHGRPHLV 


AVDLDTRVLA 


HKPAPALPAL 


LRAFAGDQGG 


3901 


QGGGRGGGRG 


GG PAR PAAAT 


TRQNVDWAAK 


LSVLTAEEQH 


RTLLDLVRTH 


3951 


AAA VLG HAG T 


DAVRADAAFQ 


DLGFDSLTAV 


ELRNRLSAST 


GLRLPATFIF 


4001 


RHPTPSAIAD 


ELRAQLAPAG 


ADPAAPLFGE 


LDKLETVTTG 




4051 


LAARLQNLLW 


RLDDTSARSD 


HAAGASDADG 


DAVENRDLES 


ASDDELFELI 


4101 


DRELPS* 











MonAVI, polyketide synthase multienzyme MONS 6, housing extension 
module 9 Length: 1701 amino acids 



1 


MPGTNDMPGT 


EDKLRHYLKR 


VTADLGQTRQ 


RLRDVEERQR 


EPIAIVAMAC 


51 


RYPGGVASPE 


QLWDLVASRG 


DAIEEFPADR 


GWDVAGLYHP 


DPDHPGTTYV 


101 


REAGFLRDAA 


RFDADFFGIN 


PREALAADPQ 


QRVLLEVSWE 


LFERAGIDPA 


151 


TLKDTLTGVY 


AGVSSQDHMS 


GSRVPPEVEG 


YATTGTLSSV 


ISGRIAYTFG 


^201 


LEGPAVTLDT 


ACSASLVAIH 


LACQALRQGD 


CGLAVAGGVT 


VLSTPTAFVE 


251 


FSRQRGLAPD 


GRCKPFAEAA 


DGTGFSEGVG 


LILLERLSDA 


RRNGHQVLGV 


301 


VRGSAVNQDG 


ASNGLTAPND 


VAQERVIRQA 


LTNARVTPDA 


VDAVEAHGTG 


351 


TTLGDP1EGN 


ALLATYGKDR 


PADRPLWLGS 


VKSNIGHTQA 


AAGVAGVI KM 


401 


VMAMRHGELP 


ASLHIDRPTP 


HVDWEGGGVR 


LLTDPVPWPR 


ADRPRRAGVS 


451 


SFGISGTNAH 


LIVEQAPAPP 


DTADDAPEGA 


ATPGASDGLV 


VPWWSARSP 


501 


QALRDQALRL 


RDFAGDASRA 


PLTDVGWSLL 


RSRALFEQRA 


WAGRERAEL 


551 


LAGLAALAAG 


EEHPAVTRSR 


EEAAVAASGD 


WWLFSGQGS 


QLVGMGAGLY 



601 ERFPVFAAAF DEVCGLLEGE LGVGSGGLRE VVFWGPRERL DHTVWAQAGL 
651 FALQVGLARL WESVGVRPDV VLGHSIGEIA AAHVAGVFDL ADACRVVGAR 
701 ARLMGGLPEG GAMCAVQATP AELAADVDGS SVSVAAVNTP DSTVISGPSG 
751 EVDRIAGVWR ERGRKTKALS VSHAFHSALM EPMLGEFTEA IRGVKFRQPS 
801 IPLMSNVSGE RAGEEITSPE YWARHVRQTV LFQPGVAQVA AEARAFVELG 
851 PGPVLTAAAQ HTLDHITEPE GPEPVVTASL HPDRPDDVAF AHAMADLHVA 
901 GISVDWSAYF PDDPAPRTVD LPTYAFQGRR FWLADIAAPE AVSSTDGEEA 
951 GFWAAVEGAD FQALCDTLHL KDDEHRAALE TVFPALSAWR RERRERSIVD 
1001 AWRYRVDWRR VELPTPVPGA GTGPDADTGL GAWLIVAPTH GSGTWPQACA 
1051 RALEEAGAPV RIVEAGPHAD RADMADLVQA WRASCADDTT QLGGVLSLLA 
1101 LAEAPATSSD TTSHTSTSCG TGSLASHGLT GTLTLLHGLL DAGVEAPLWC 
1151 ATRGAVSCGD ADPLVSPSQA PVWGLGRVAA LEHPELWGGL VDLPADPESL 
1201 DASALYAVLR GDGGEDQVAL RRGAVLGRRL VPDATPDVAP GSSPDVSGGA 
12 51 AHADATSGEW QPHGAVLVTG GVGHLADQW RWLAASGAEH WLLDTGPAN 
1301 SRGPGRNDDL AAEAAEHGTE LTVLRSLSEL TDVSVRPIRT VIHTSLPGEL 
1351 APLAEVTPDA LGAAVSAAAR LSELPGIGSV ETVLFFSSVT ASLGSREHGA 
14 01 YAAANAYLDA LAQRAGADAA SPRTVSVGWG IWDLPDDGDV ARGAAGLSRR 
14 51 QGLPPLEPQL ALGALRAALD GGKGHTLVAD IEWERFAPLF TLARPTRLLD 
1501 GIPAAQRVLD ASSESAEASE NASALRRELT ALPVRERTGA LLDLVRKQVA 
1551 AVLRYEPGQD VAPEKAFKDL GFDSLWVEL RNRLRAATGL RLPATLVYDY 
1601 PTPRTLAAHL LDRVLPDGGA AELPVAAHLD DLEAALTDLP ADDPRRKGLV 
1651 RRLQTLLWKQ PDAMGAAGPA DEEEQAAPED LSTASADDMF ALIDREWGTR 
1701 * 

MonH, probable regulatory protein Length: 981 amino acids 



1 


w tjKjv £ji\U V uu 


r-lo u V o^bUbL 


flpT \7T7 l D A P* A T 

nbL V £i KAJl. A±j 


A A T O /""* A XT' r*\/~" O 

AALRGAFDGS 


PGTGGSLWL 


s i 
D x 


obM. V b 1 bt\m 


T T D A. Ta7 A PiD T r* 
Xj Xj KA WA U K X b 


A Pi A Pi A T \7T T» A 
A U A DAXj V Xj I A 


T AC RAE R DL P 


LGVLEQLVRS 


i n i 

1U1 


it b Xj Jr ir A o A Lj K 


7A T A Ta7Ta7 Pt IT IT A C 
AXjAWWU£jEjAb 


AI rbKI UANb 


T SAN GT DANG 


TGAGQTGAGQ 


i si 


Ab V by lb V bb 


TTD\7T 7A7AC7AT D 


fT fTT'TT'T T~> PvT T 

bXibh VxjKDLiXj 


AERPVVVAVD 


DAHHADAASL 


9 m 


y bXjXiO V V r\r\Xj 


DCTvpT U\7T TTT 


tj I Any KAy NA 


T T O O r~" r~"T r in n 

LrLbSEFLHEP 


ALRRIRLEPL 


9 m 

£D ± 


Q k'' A. ZtT" 7A T T 7i 
o r\Ab V CjAXjXjA 


out pvtr DTa^n 


XjI rv VrlblYlbA 


GHPLLVRALA 


EDHRAAGGAG 




Hjcx i \j rvrt v Lor 


Xj X l\Xi J_j 1 r V 1 y 


VrlKAX AAXjbA 


I_J A DrO\ rpnr 

riAb ir by V b KJj 


T PvT7P\A A O T 7T —> T~> 

IjDVDAASVER 


^ si 


AVKyXjl VALjV 


Xj rt t» b KXj b ri tr A 


CAAAWT n^MD 

t AAAVxjUbiyiF 


PEERRALHGR 


VADLLHEEGA 


*± U X 


D 7\ TIT1 7 7\ 7\ LJT T 7 

Ir Al hjVAAriljV 


A A PiD CPi7\ T~)r*7 A 


V PVF QEAAQL 


ALDEDQVETG 


VDYLRAAHQR 


4 s i 


b Kb/i/\y KAA V 


V b AXjA UriLj W K 


T Pi P> A L^T 7T OUT 

Xj u r AK V Li KH L 


PDPAAMAPQT 


DPAALAPHTD 


SOI 


ir/Air 1 AAir 1 AA 


triirlirXlrl 1 r 


rxjir'IHLijWHb 


RVEEGLDAIG 


TLTGPGPNPA 


SRI 

J Z) X 


b/-\£T xrL v lLM ir r\L)±j 


Ui rr WXjWbril Xj 


I irbrlvrS.il.Kljb 


bbAlib ry Kb I 


P PAVT PELQG 


DUX 


ApfPT MM RT T U 
rib 1 XjIXIN UXjXjri 


bbH,KDAX EjAA 


bKAljNK X KLib 


P RT I AVQ TAA 


LAALTYRDRP 


£ S 1 


riKAAAWb Ublj 


VAyADriKN b ir 


T WRAL F T Aw R 


ALLHLRQGDP 


AAAEQRAETA 


/ u X 


T 7A T T C t<TT*7f~' 

L/\Xiiboi\bl/vb 


A A T fT DT AAA 

AAxbljirxjAAA 


VQ A tsAAL G D V 


I~\ TV TV TV T T T~ 1 T*V 

DGAAALLERP 


VPQAVFQTRT 


7 S 1 

/ox 


bXjfl I XiAAKbK 


I riXjAi bbniA 


AJLibUr YACbx 


RMS SWGVDLP 


ALE PWRLG AA 




r7\VT A T XPr^T 
EjA X XjAXjbiljbXj 


XiAKy Xj V JJby Xj 


tit nTTirvr^rTiTi 

PLPTPDDGRT 


WGMTLRLRAA 


TSPAPARAEL 


ft S 1 

O J X 




obUl r &XjAKA 


T 7 A T\r\ 7\ \ T7\ \ 7TD TT 

VAUyAVAVKii, 


GGEAERARLL 


T\ T~* T S TV T - ' T T TV n n 

ARKAELLARR 


901 


WGSAPAPATV 
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RRLDLQAALG 
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MonCI, flavin-dependent epoxidase Length 


: 496 amino acids 


1 


VTTTRPAHAV 


VLGASMAGTL 


AAHVLARHVD 


AVTVVERDAL 


PEEPQHRKGV 


51 


PQARHAHLLW 


SNGARLIEEM 


LPGTTDRLLA 


AGARRLGFPE 


DLVTLTGQGW 


101 


QHRFPATQFA 


LVASRPLLDL 


TVRQQALGAD 


NITVRQRTEA 


VELTGSGGGS 



151 GGRVTGVWR DLDSGRQEQL EADLVI DATG RGSRLKQWLA ALGVPALEED 

201 WDAGVAYAT RLFKAPPGAT THFPAVNIAA DDRVREPGRF GWYPIEGGR 

251 WLATLSCTRG AQLPTHEDEF IPFAENLNHP ILADLLRDAE PLTPVFGSRS 

301 GANRRLYPER LEQWPDGLLV IGDSLTAFNP IYGHGMSSAA RCATTIDREF 

351 ERSVQEGTGS ARAGTRALQK AIGAAVDDPW ILAATKDIDY VNCRVSATDP 

401 RLIGVDTEQR LRFAEAITAA SIRSPKASEI VTDVMSLNAP QAELGSNRFL 

451 MAMRADERLP ELTAPPFLPE ELAWGLDAA TISPTPTPTP TAAVRS* 
MonBII, carbon-carbon double bond isomerase Length: 141 amino acids 

1 MPDEAARKQM AVDYAERINA GDIEGVLDLF TDDIVFEDPV GRPPMVGKDD 

51 LRRHLELAVS CGTHEVPDPP MTSMDDRFW TPTTVTVQRP RPMTFRIVGI 

101 VELDEHGLGR RVQAFWGVTD VTMDDPAGPA DTTHPEGIRA * 

MonBI, carbon-carbon double bond isomerase Length: 144 amino acids 

1 MNEFARKKRA LEHSRRINAG DLDAIIDLYA PDAVLEDPVG LPPVTGHDAL 
51 RAHYEPLLAA HLREEAAEPV AGQDATHALI QISSVMDYLP VGPLYAERGW 
101 LKAPDAPGTA RIHRTAMLVI RMDASGLIRH LKSYWGTSDL TVLG* 

MonAVIII, polyketide synthase multienzyme MONS 8, housing extension 
modules 11 and 12 Length: 3754 amino acids 

1 MSNEEKLLDH LKWVTAELRQ ARQRLHDKES TEPVAIVGMA CRYPGGARSA 

51 EDLWELVRDG GDAVAGFPDD RGWDLESLYH PDPEHPATSY VRDGAFLYDA 

101 GHFDAEFFGI SPREATAMDP QQRLLLETAW EAIEHAGMNP HALKGSDTGV 

151 FTGVSAHDYL TLISQTASDV EGYIGTGNLG SWSGRISYT VGLEGPAVTV 

201 DTACSSSLVA IHLASQALRQ GECSLALAGG STVMATPGSF TEFSRQRGLA 

251 PDGRCKPFAA AADGTGWGEG AGWALELLS EARRRGHKVL AVIRGSATNQ 

301 DGTSNGLAAP NGPSQERVIR AALANARLSA EDIDAVEAHG TGTTLGDPIE 



^351 AQALIATYGQ grpedrplwl gsvksnight qaaagvagvi kmvmamrngl 

401 LPTSLHIDAP SPHVQWEQGS VRLLSEPVDW PAERTRRAGI SAFGISGTNA 

4 51 HLILEEAPPE EDAPGPVAAE PGGVVPWVVS GRTPDALREQ ARRLGEFAAG 

501 LADASVSEVG WSLATTRALF DQRAVWGRD LAQAGASLEA LAAGEASADV 

551 VAGVAGDVGP GPVLVFPGQG SQWVGMGAQL LDESPVFAAR IAECEQALSA 

601 HVDWSLSDVL RGDGSELSRV EVVQPVLWAV MVSLAAVWAD YGITPAAVIG 

651 HSQGEMAAAC VAGALSLEDA ARIVAVRSDA LRQLQGHGDM ASLSTGAEQA 

7 01 AELIGDRPGV WAAVNGPSS TVISGPPEHV AAWADAEAQ GLRARVIDVR 

7 51 YASHGPQIDQ LHDLLTDRLA DIQPTTTDVA FYSTVTAERL DDTTALDTAY 
801 WVTNLRQPVR FADTIEALLA DGYRLFIEAS PHPVLNLGIQ ETIEQQAGAA 

8 51 GTAVTIPTLR RDHGDTTQLT RAAAH A FT AG APVDWRRWFP ADPTPRTVDL 
901 PTYAFQHKHY WVEPPAAVAA VGGGHDPVEA RVWQAIEDLD IDALAGSLEI 
951 EGQAESVGAL ESALPVLSAW RRRHREQSTV DSWRYQVTWK HLPDVPAPEL 
1001 SGAWLLLVPA AHADHPAVLA TAQTLTAHGG EVRRHVVDAR AMERTELAQE 
1051 LRVLMDGAAF AGWNLLALD EEPHPEHSAV PAGLAATTAL VQALADNGAD 
1101 IAVRTLTQGA VSTSAGDALT HPVQAQVWGL GRVAALEYPR LWGGLVDLPA 
1151 RIDHQTLARL AAALVPQDED QISIRPSGVH ARRLAHAPAN TVGSGLGWRP 

^1201 DGTTLITGGT GG I GAVLARW LARAGAPHLL LTSRRGPDAP GAQELAAELT 
1251 ELGAAVTVTA CDVGDREQVR RLIDDVPAEH PLTAVIHAAG VPNYIGLGDV 
1301 SGAELDEVLR PKALAAHHLH ELTRELPLSA FVMFSSGAGV WGSGQQGAYG 

1351 AANHFLDALA EHRRAEGLPA TSIAWGPWAE AGMAADQAAL TFFSRFGLHP 
14 01 LSPELCVKAL QQALDAGETT LTVANFDWAQ FTSTFTAQRP SPLLADLPEN 
14 51 RRASAPAAQQ EDATEASSLQ QELTEAKPAQ QRQLLLQHVR SQAAATLGHS 
1501 DVDAVPATKP FQELGFDSLT AVELRNRLNK STGLTLPTTV VFDHPTPDAL 



1551 TDVLRAELSG DAAASADPVR AAGASRGAAD DEPIAI VGMA CRYPGDVRSA 

1601 EELWDLVAAG KDAMGAFPDD RGWDLETLYD PDPESRGTSY VREGGFLYDA 

1651 GDFDAGFFGI SPREAVAMDP QQRLLLETAW EAIERAGLDR ETLKGSDAGV 

1701 FTGLTI FDYL ALVGEQPTEV EGYIGTGNLG CVASGRVSYV LGLEGPAMTI 

1751 DTGCSSSLVA IHQAAHALRQ GECSLALAGG ATVMATPGSF VEFSLQRGLA 

1801 KDGRCKPFAA AADGTGWAEG VGLVVLERLS EARRNGHNVL AVIRGSAINQ 

1851 DGTSNGLTAP NGQAQQRVIR QALANARLSA EDVDAVEAHG TGTMLGDPIE 

1901 ASALVATYGK ERPADRPLWL GSIKSNIGHA QASAGVAGVI KMVMALRNEQ 

1951 LPASLHIDAP TPHVDWDGSG VRLLSEPVSW PRGERPRRAG VSAFGISGTN 

2001 AHLILEQAPD APEPVTAPAE DAAAPAGWP WWSARGEEA LRAQARLLAD 

2051 RATADPRLAS PLDVGWSLVK TRSVFENRAV WGKDRQTLL AGLRSLAAGE 

2101 PSPDWEGAV QGASGAGPVL VFPGQGSQWV GMGAQLLDES PVFAARIAEC 

2151 ERALSAHVDW SLSAVLRGDG SELSRVEWQ PVLWAVMVSL ASVWADYGIT 

2201 PAAVTGHSQG EMAAACVAGA LSLEDAARIV AVRSDALRQL MGQGDMASLG 

2251 AGSEQVAELI GDRPGVCVAA VNGPSSTVIS GPPEHVAAW ADAEARGLRA 

2301 RVIDVGYASH GPQIDQLHDL LTERLADIRP TTTDVAFYST VTAERLDDTT 

2351 TLDTDYWVTN LRQPVRFADT IEALLADGYR LFIEASPHPV LNLGMEETIE 

2401 RADMPATWP TLRRDHGDAA QLTRAAAQAF GAGAEVDWTG WFPAVPLPRV 

2451 VDLPTYAFQR ERFWLEGRRG LAGDPAGLGL ASAGHPLLGA AVELADGGSH 

2501 LLTGRISPRD Q AWL AE H R VM DTVLLPGSAF VELALQAAVR AGCAELAELT 

2551 LHTPLAFGDE GAGAVDVQW VGSVAEDGRR PVTVHSRPTG EGEEAVWTRH 

2601 AAGWAPPGP DAGDASFGGT WPPPGATPVG EQDPYGELAS YGYDFGPGSQ 

2651 GLVSAWRLGD DLFAEVALPE AESGRADRYQ VHPVLLDATL HALILDAVTS 

2701 SADTDQVLLP FSWSGLRVHA PGAEKLRVRI ARTAPDQLAL TAVDGGGGGE 



^p,7 51 PVLTLESLTV RPVAAHQIAG ARAADRDALF RLVWMEVAAR AEETGGGAPR 

2801 AAVLAPVESG PMGGTSAGAL ADALSDALAA GPVWDT FGAL RDGVAAGGEA 

28 51 PDWLAVCAA PGAGAGAVAD ADGRGGDPAG YARLATVSLL SLLKEWVDDP 

2 901 AFAATRLWV TRGAVAARPG ETAGDLAGAS LWGLVRSAQA ENPGRLTLLD 

2 951 VDGLESSPAT LTGVLASGEP ELALRDGRAY VPRLVRDDAS VRLVPPVGSL 

3001 TWRLARCQEA GGGQQLSLVD APEAGRALEP HEVRVAVRAA APGPLTAGQV 

3051 EGAGWTEVG GEVGSVAVGD RVMGLFDAVG PVAVT DAALL MPVPAGWSWA 

3101 QAAGSLGAYV SAYHVLADW APRGGETLLV GEETGSVGRA VLRLALAGRW 

3151 RVEAVDGAST ADDSGAERAA DVTLRHEGAL WHRAGGRPD EGQAVVPPEP 

^3201 GRVREILAEL TELTELAEIT ESAEPGLPAE RGDSRALTPL DITVWDIRQA 

3251 PAAMAAPPSA GTTVFSLPPA FDPEGTVLVT GGTGALGSLT ARHLVERYGA 

3301 RHLLLSSRRG ADAPGALELA ADLSALGARV TFAACDPGDR DEAAALLAAV 

3351 PSDHPLTAVF HCAGTVNDAV VQNLTAEQVE EVMRVKADAA WHLHELTRDA 

3401 DLSAFVLYSS VAGLLGGPGQ GSYTAANAFL DALARHRHDG GAAATSLAWG 

34 51 YWELASGMSG RLTDADRARH ARAGWGLGA DEGLALLDAA WAGGLPLYAP 

3501 VRLDLARMRR QAQSHPAPAL LRDLVRGGSK SGGGAVSAGA AALLKSLGAM 

3551 SDPEREEALL DLVCTHIAAV LGYDAAT PVN ATQGLRELGF DSLTAVELRN 

601 RLSAATGLKL PATFVFDHPN PAELAAQLRQ ELAPRAADPL ADVLAEFERI 

3651 EDSLLSVSSK DGSARAELAG RLRATLARLD APQDTAGEVA VATRTRIQDA 

37 01 SADEIFAFID RDLGRDGASG QGNGQPTGQG NGHGNGNGNG NGNGHGQAVE 

3751 GQR* 

MonAVII, polyketide synthase multienzyme MONS 7, housing extension 
module 10 Length: 1642 amino acids 

1 MAHTEEKLLE YLKRVTADLR QTERRLQDVE SAGHEPVAVI GMACRLPGGV 
51 RSPEEFWELV STGGDAVAPL PGNRNWDLDS LYDPDPESTG TSYVREGGFV 



» 

101 YDAGDFDPTF FGIGPTEAAA MAPQQRLALE TAWEAIERAG IDPLSLRSSD 
151 TSTFIGCDGL DYALGASEVP EGTAGYFTIG NSGSVTSGRV AYTLGLEGPA 
201 VTVDTACSSS LVSLHLATQA LRTQECSLAL AGGTYVMSSP APLIGFSELR 
251 GLAPDGRCKP FSASSDGMGM AEGTGWLLE RLSDARRKGH KVLAVI RGSA 
301 INQDGASNGL TAPNGPAQER VIRAALANAR LAPEDIDAVE AHGTGTTLGD 
351 PIEAGALISA YGRERPEDRP LWVGAVKSNI GHTQIAAGVA GVIKMVLALR 
401 HDLLPAILHV DAPSPHVEWD GSGLRLLTDP VKWPRGERPR RAGVSSFGFS 
451 GTNAHLILEE APPEEEDVPG SVAEEPGGW PWWSGRTPD ALRAQARRLG 
501 EFAAGPADAS AADVGWSLTT TRSVFEHRAV VVGRDRDALT AGLGALAAGE 
551 ASAGVVAGVA GDVGPGPVLV FPGQGSQWVG MGAQLLDESP VFAARIAECE 
601 RALSAYVDWS LSAVLRGDGS ELSRVEWQP VLWAVMVSLA AVWADYGVTP 
651 AAVIGHSQGE MAAACVAGAL SLEDAARIVA VRSDALRRLQ GHGDMASLST 
701 GAEQAAELIG DRPGWVAAV NGPSSTVISG PPEHVAAWA DAEARGLRAR 
751 VIDVGYASHG PQIDQLHDLL TERLADIRPA NTDVAFYSTV TAERLTDTTA 
801 LDTDYWVTNL RQPVRFADTI EALLADGYRL FIEASAHPVL GLGMEETIEQ 
851 ADIPATVVPT LRRDHGDTTQ LTRAAAHAFT AGAPVDWRRW FPADPTPRTV 
901 DLPTYAFQHQ HYWLERSASA SGAVSGEQSA AEAQLWHAVE ELDLGLLAET 
951 LGSEEGSEEA VRALEPALPV LKGWRRRHQD QATIDSWRYR VTWKQRSDGP 
1001 APELGGDWLL FVPADKAEHP AVRATAEALS EHGAAAVRLH PVETGRAGRQ 
1051 ELAAVDTAGL AGIVNLLALD EEPHPEHPAV PAGLAATTAL LQALGDNGTT 
1101 APLHTVTQGA VSTGATDPLT HPLQAHVWGL GRVAALEHPR LWAGLVDLPA 
1151 RIDRHTLPRL AAALLPQDDE DQTAVRPTGI HHRRLTHAVG SIQNPVHSEA 
1201 TWRPRGTTLI TGGTGGIGAV LARWLARQGA PRLHLTSRRG PDAPGARELA 
12 51 AELDGLGTAV TITACDVSDP RQLSGLIDDM PAEHPLTAVI HAAGMTDLTA 



^L301 IGDLTTARLG EVLGSKSDAA WNLHELTRDL DLSAFVMFSS GAGVWGSGQQ 

1351 GAYGAANHFL DALAEHRRAQ GLPATSIAWG PWAEAGMSAD PESLTYFKRF 

1401 GLLPIAPDLC VKALHQAVDA GDATLTVANF DWAKFTPTFT AQRPSPFLDD 

14 51 LPENQREAEQ TGTAAETSAF REELAKT PAS QRLGFLVQQV RTYAAATLGR 

1501 TVEDIPAAKP FQELGFDSLT AVQLRNQLNT TTGLSLPATV I FDHPTPEAL 

1551 ATHLRGQLGD GAEVAGEGDV LAALDKWDTA FGAAEVDEAA RRRIVGRLQV 

1601 LVSKWSPAQD GPEGTDSAHA DLEAASADDI FDL1SSEFGK S* 
MonD, cytochrome P450 hydroxylase Length: 431 amino acids 

1 VGLTVGPDNA KRGIVPITDS KPAATFPDLV DPSFWARPHA ERVALFEEMR 

51 GLPRPAFIRQ NMPGVPWTFG YHALVKYADI VEVSRRPQDF SSNGATTIIG 

101 LPPELDEYYG SMINMDNPEH 5RLRRIVSRS FGRNMI PEFE AVATRTARRI 

151 IDELIARGPG DFIRPVAAEM PIAVLSDMMG IPAEDHDFLF DRSNTIVGPL 

201 DPDYVPDRAD SERAVIEASR ELGDYIAGLR AERLAAPGND LITKLVQVQA 

251 DGEQLTRQEL VSFFILLVIA GMETTRNAIS HALVLLTEHP EQKQLLLSDF 

301 DTHAPNAVEE ILRVSTPINW MRRVATRDCD MNGHRFRRGD RIFLFYWSGN 

351 RDESVFPDPY RFDITRGTNA HVTFGAVGPH VCLGAHLARM EITVLYRELL 

401 AALPQIHAVG QPRRLDSSFI EGIKHLHCAF * 

MonRI, probable activator protein Length: 268 amino acids 

1 VRYEMLGPLR IKDGNDYATI NAQKVEIVLT VLLIRADRW SLEQLMREIW 

51 GEDLPRRATA GLHVYISQLR KFLKVPGSAG NPVETRAPGY VLHKRDDDQI 

101 DAQIFPELVD VGRSLLREKR FDEAASCFGQ ALALWRGPIL GQGGNGPGTN 

151 GPIIDGFSTW LTEIRLECQE MLVECQLQLG RHREAVGMLY ALTAENPMCE 

201 AFYRQLMLAL YRSERQADAL KVYQSVRKTL NDELGLEPGR PLQELQEUVIL 

251 AGDMHLMSPP PLALSGR* 



MonAX, thioesterase Length: 278 amino acids 

1 LSAFLAKGKI LSAFPPPDMS DPWIRRFRPR PEAWRLVCF PHAGGSASYY 
51 HPLAQSPTLP TDSEVLAVQY PGRQDRRRER LLDDIGELAD LITDALGPFD 
101 DRPLAFFGHS MGAVLAYEVA QRLRERTGKQ PCRLFVSGRR APSRFRRGTV 
151 HLLDDTELAA ELRRAGGTDP RFLDDEELLA EIIPWRNDY RAVEL YRWNP 
201 SPPLSCPITA LVGDRDPQAP LDEVEAWQQH TEGPFDLKVF AGGHFYLNTH 
251 QQGVTEVISK ALADSAQQRA TARGNAR* 

ORF29, a homologue of CapK involved in cell wall biosynthesis Length: 428 
amino acids 



1 


LADLVAHARS 


ASPYYRELYH 


GLPERIEDPT 


LLPVTDKKQL 


MDHFDDWPTD 


51 


RDITFEKVRA 


FTDDPELIGR 


RFLGRYLVAT 


TSGTSGRRGL 


FVLDDRYMNV 


101 


SSAVSSRVLA 


SWLGPLGIAR 


AVVHGGRFAQ 


LVATEGHYVG 


FAGYSRLRQD 


151 


GEARSKLVRA 


FSVHEPMSRL 


VAELNEYRPA 


FVIGYASTIM 


LFTAEQEAGR 


201 


LHIDPVLVEP 


AGETMTESDT 


DRIAAAFGAK 


VRTMYSATEC 


TYLSHGCAEG 


251 


WYHVNDDWAV 


LEPVDADHRP 


TPPGEFSHTT 


LISNLANRVQ 


PFLRYDLGDS 


301 


VMLRPDPCPC 


GTPSPAIRVQ 


GRSGDILTFP 


SGRGDDVSLA 


PLAFSSLFDR 


351 


MPGVELFQIE 


QTAPSTLRVR 


WQAPGADAD 


HVWQRAHDGL 


THLLADNKLD 


401 


NVTVERGEEP 


PRQASGGKYR 


TIIPLAA* 






LipB, lipase B Length: 338 amino acids 






1 


VKVPVEVTVR 


LSSWLGGLVA 


AVLAATVLPA 


SAASAADVSS 


PPLEIPAAEL 


51 


AKALHCGTEL 


GDLRDAGDKP 


TVLFVPGTGL 


KGEENYAWNY 


MAELKKKGYQ 


101 


SCWVDSPGRG 


LRDMQESVEY 


WYATRAIQE 


ATGRKVDLVG 


HSQGGLLTAW 


151 


ALRFWPDLPG 


KVDDMVTLGS 


PFQGTRLASP 


CRPIAEVAGC 


PASVLQFARD 


201 


SNWSKALGAD 


GTPMPAGPSY 


TTIYSYADES 


VVADGEAPSL 


PGAHRIGVQD 



^251 ICPGRPWPTH IAMWDQVSY DLVADAIEHP.. GPADTSRIDR AHCAKPVMPL 
301 NSQEAVDALP GLLNFPIELL IHSQPWVDEE PPLRPYAR* 

ORF31, putative ion pump Length: 309 amino acids 

1 MGHDHGPSAG AAGGTLSGTY RKRLLWTIGI SGSITVIQW GALLSGSLAL 

51 LADAAHSLTD AVGVSLALGA ITLAQRAPTP RRTFGFCRVE IFSAVLNALL 

101 LWIFAWVLW SAIGRFSEPV EVKGGLMFW ALGGLAANLV GLWLLRDAKE 

151 KSLNLRGAYL EVLGDALGSV AVIVGGLVIL LTGWQAADPI ASIVIGLLIV 

201 PRAYGLLRDS LHVLLEATPQ DVDLGEVRRH LLEERGWAV HDLHGWTVTS 

251 GMPVLTAHW VTEEALASGY GELLGRLQRC VGGHFDVAHS TIQLEPEGHV 

'301 EEDGALHT* 
ORF32, hypothetical membrane protein Length: 364 amino acids 

1 MTRALTLHDW I VAGI A WAG WAGLLLRAL LRWLGERASK TRWSGDDVIV 

51 DALRTLVPCA AITAGLAAAA GALPLTPRTG RNVTMTLTAL LILAATLTAA 

101 RIVTGLVKAV AQSRSGVAGS ATIFVNITRV VVLAMGFLIV LQTLGISIAP 

151 LLTALGVGGL AVALALQDTL ANLFAGVHIL AAKTVQPGDY IQLSSGEEGY 

201 WDINWRNTT VRQLSNNLVI IPNAKLAGTN MTNYSRPEQE LSIMVQVGVS 

251 YDSDLEQVEK VTTEWDEVM AEITGAVPDH EAAIRFHTFG DSRISFTVIL 

^01 GVGEFSDQYR IKHEFIKRLH QRYRAEGIRV PAPVRTVRVQ QGELPPPLGI 
351 PHQRDTSTQA RLH* 

AmtA, glycine amidinotransferase (partial coding sequence) 
Length: 131 amino acids 

1 MSPVNSHNEW DPLEEIIVGR LEGATIPSSH PWACNIPTW AARLQGLAAG 
51 FEYPQRLIEP AQQELDQFIA LLQSLDVTVR RPAAVDHKHR FGTPDWQSRG 
101 FCNSCPRDSM LWGDEIIET PMAWPCRCFE .T 
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